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FOREHORD 



This document describes the development and field testing of a trial bat- 
tery of newly constructed predictor measures for evaluating the potential per 
foraance of Aray applicants. The research was part of the Amy's current, 
large-scale manpower and personnel effort for improving the selection, classi- 
fication, and utilization of Amy enlisted personnel. The thrust for the 
project cane from the practical, professional, and legal need to validate the 
Arwd Services Vocational Aptitude Battery (ASVAB— the current U.S. military 
selection/classification test battery) and other selection variables as pre- 
dictors of training and performance. The portion of the effort described 
herein IS devoted to the development and validation of Amy Selection and 
Classification Measures, and referred to as "Project A." Another part of the 
effort is the development of a prototype Computerized Personnel Allocation Sys- 
tem, referred to as "Project 8." Together, these Amy Research Institute re- 
search efforts, with their in-house and contract components, comprise a land- 
mark-program to develop a state-of-the-art, empirically validated personnel 
selection, classification, and allocation system. 



EDGAR M. JOHNSON 
Technical Director 
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DEVELOPMENT- AND FIELD TEST OF THE TRIAL BATTERY FOR PROJECT A 
EXECUTIVE SUMMARY 



Requirement: 



tho c!w*^ ^ ^^^•"?e-scale. multiyear research program intended to imorove 

predictor measures! ^ ^ °^ "^^'^^ developed experimental 

nnoH 1*'^^^!''°^ describes the development and field test of the newly devel- 
oped predictor measures. aevei- 

Procedure: 

Early activities included a large-scale literature review the collection 

?i?ild"' f ;he'm'pP%'' ^"^3"^^"'^ 0^ tests a d n5entS? es Iden- 

Jre 'iLrJ eJierv^nr-n^ TJ.J^' construction and administration of a 
ZV.L7fn M- I II °l Off-the-shelf" tests and inventories. These activities 

tieT?ha\° tlZl JS\^fJlTp"ro:[s2!^ '''''''^ ~' ^-^^ ^'^^ 
ofcoJJiHv^^ 

nt" r sJ";:? a^Mel 'an^rL^'T'^^^"'' biographical da^'a^/ind'^It? al 
pSmotor abn?ties °' computer-administered measures of perceptual/ 

These new measures were developed in an iterative manner. The measures 
ft?f *° P^^°* ^^^^ revisions occurring between each D?lot 

'final* rells?;ns'::?e"^^d:!" -^^-tively administeredM%"ld"?Lf ^aS5 

tion<;°o^'t"L*Ilf ^"^ ^^^^^ *est. several analyses and evalua- 

Jes^ reliaMl??! ^"^^ distributions and various types of 

measured an J^'^P"*^^:, ^he extent to which each new test or scale 

2ptPr^?„f5 t{ presently measured by the ASVAB (called uniqueness) 
fSf SIS^'^VJ^^ "^'""^ "ew measures related to each other and 

to the ASVAB subtests was analyzed. Investigations were made of the effect 
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of practice and idiosyncr..sies of testing stations on computer-administered 
tests. The effects of faking on the temperament, biodata, and vocational in 
terest measures were also investi gated « 



Findings: 

The intended objectives of the research were realized. The newly devel- 
oped predictor measures were shown to have adequate to excellent psychometric 
properties (that is, sufficiently large score distributions and acceptably high 
reliabilities), to be relatively unique (that is, to measure abilities not mea- 
sured by the ASVAB), to be not unduly affected by practice, and not largely af- 
fected by faking in an applicant -like setting. Also, preliminary methods for 
detecting and correcting for faking were shown to be effective. 

The final set of measures, called the Trial Battery, contains six pape-- 
and-pencil, cognitive ability tests, 10 computer-administered tests of percep- 
tual /psychomotor ability, and two paper-and-pencil inventories containing over 
30 scales that measure temperament, biographical data, and vocational inter- 
ests. The entire battery requires about 4 hours of time to administer. 



Utilization of Findings: 

The Trial Battery will be used in the Concurrent Validation Phase of 
Project A. Soldiers' scares on the Trial Battery wni be compared to their 
scores on job performance criterion measures (also developed by Project A) to 
evaluate the validity of the Trial Battery and to evaluate the extent to which 
it improves the prediction of job performance over that achieved by the ASVAB. 
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OVERVIEW OF PROJECT A 



Project A is a comprehensive long-range research and development 
program >rh1ch the U.S. An^y has undertaken to develop an Improved personnel 
selection an! classification system for enlisted personnel. The Army's 
goal is to increase its effectiveness in matching first-tour enlisted 
manpower requirements with available personnel resources, through use of 
new and improved selection/classification tests which will validly predict 
carefully developed measures of job performance. The project addresses the 
675,000-person enlisted personnel system of the Army, encompassing several 
hundred different military occupations. 

This research program began in 1980, when the U.S. Army Research 
Institute (ARI) started planning the extensive, research effort that would 
be needed to develop the desired system. In 1982 i consortium led by the 
Human Resources. Research Organization (HumRRO) and including the American 
Institutes for Research (AIR) and the Personnel Decisions Research Insti- 
tute (PDRI) was selected by ARI to undertake the 9-^year project. The total 
project utilizes the services of 40 to 50 ARI and consortium researchers 
working col legi ally in a variety of specialties, such as industrial and 
organizational psycholor , operations research, management science, and 
computer science. 

The specific objectives of Project A are to: 

• Validate existing s<»lection measures against both existing and 
project-developed criteria. The latter are to include both Army- 
wide job performance measures based on newly developed rating 
scales, and direct hands-on measures of MOS-specific task perfor- 
mance. 

• Develop and validate new selection and classification measures. 

• Validate intermediate criteria (e.g., performance in training) as 
predictors of later criteria (e.g., job performance ratings), so 
that better informed reassignment and promotion decisions can be 
made throughout a soldier's career. 

• Determine the relative utility to the Army of different performance 
levels across HOS. 

• Estimate the relative effectiveness of alternative selection and 
classification procedures in terms of their validity and utility 
for making operational selection and classification decisions. 

The research design for the project incorporates three main stages of 
data collection and analysis in in iterative progression of development, 
testing, evaluation, and further development of selection/classification 
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instruments (predictors) and measures of job performance (criteria). In 
the first iteration, file data from Army accessions in fiscal years (FY) 
1981 and 1982 were evaluated to explore the relationships between the 
scores of applicants on the Armed Services Vocational Aptitude Battery 
(ASVAB), and their subsequent performance in training and their scores on 
the first-tour Skills and Qualification Tests (SQT). 

In the second iteration, a concurrent validation design will be exe- 
cuted with FY83/84 accessions. . As, part of the preparation for the Concur- 
rent Validation, a "preliminary battery" of ' perceptual , spatial, tempera- 
ment/personality, interest, and biodata predictor measures was assembled 
and used to test several thousand soldiers as they entered in four Military 
Occupational Spedial ties (MOS)l ^The data from this "preliminary battery 
sample" aldn^ with, i'hforniiation from a Uirge^scale literature review and a 
set of Structured,' ex|>ert judjgtnents'- were then used to identify "best bet" 
measures. ''The'se ."best'¥e^" measures weire developed, pilot tested, and 
refined.' The refineitf test battery was then field tested to assess relia- 
bilities, "falcabllity," practice effects, and so forth. The resulting 
predictor battery now called the "Trial Battery," which includes computer- 
administered perceptual and psychomotor measures, will be administered 
together with a comprehensive set of job performance indices based on job 
knowledge tests, hands-on job samples, and performance rating measures in 
the Concurrent Validation. 

In the third iteration (the Longitudinal Validation), all of the 
measures, refined oh the basis of experience in field testing and the 
Concurrent Validation, will be administered in a true predictive validity 
design. About 50,000 soldiers across 20 MOS will be included in the FY86- 
87 "Experimental- Predictor Battery" administration and subsequent first- 
tour measurement. About 3500 of these soldiers are estimated for avail- 
ability for second-tour performance measurement in FY91. 

.For both the concurrent and longitudinal validations, the sample of 
MOS was specially selected as a representative sample of the Army's 250+ 
entry-level MOS. The selection was based on an initial clustering of MOS 
derived from rated similarities of job content. These MOS account for 
about 45 percent of Army accessions. Sample sizes are sufficient so that 
race and sex fairness can be empirically evaluated in most MOS. 

Activities and progress during the first two years of the project were 
reported for FY83 in ARI Research Report 1347 and its Technical Appendix, 
ARI Research Note 83-37, and for FY84 in ARI Research Report 1393 and its 
related reports, ARI Technical Report 660 and ARI Research Note 85-14. 
Other publications on specific activities during those years are listed in 
those annual reports. The annual report on project-wide activities during 
FY85 is under preparation. 

For administrative purposes. Project A is divided into five research 
taski: 

Task 1 -- Validity Analyses and Data Base Management 
Task 2 -- Developing Predictors of Job Performance 
Task 3 -- Developing Measures of School/Training Success 
Task 4 -- Developing Measures of Army-Wide Performance 
Task 5 -- Developing MOS-Specific Performance Measures 
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The development and revision of the wide variety of predictor and 
criterion measures reached the stage of extensive field testing during FY84 
and the first half of FY85. These field tests resulted in the formulation of 
the test batteries that will be used in the comprehensive Concurrent 
Validation program which is being initiated in FY85. . 

The present report is one of five reports prepared under Tasks 2-5 to 
report the development of the measures and the results of the field tests, 
and to describe the measures to be used in Concurrent Validation. The five 
reports are: 

Task 2 — "Development and Field Test of the Trial Battery for 

Project A," Norman G. Peterson, Editor, ARI Technical Report 
739, May 1987. 

Task 3 — "Development and Field Test of Job-Relevant Knowledge Tests 
for Selected MOS," by Robert H. Davis, et al., ARI Technical 
Report in preparation* 

Task 4 — "Development and Field Test of Army-Wide Rating Scales and the 
Rater Orientation and Training Program," by Elaine D. Pulakos, 
and Walter C. Borman, Editors, ARI Technical Renort 716, 
October 1985. 

Task 5 — "Development and Field Test of Task-Based MOS-Specific 
Criterion Measures," Charlotte H. Campbell, et al., ARI 
Technical Report 717, October 1985. 

— "Development and Field Test of Behaviorally Anchored Rating 
Scales for Nine MOS," Jody L. Toquam, et al., ARI Technical 
Report in preparation. 
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THEORETICAL AFPROACH. RESEARCH DESICM AND ORGANIZATION. AND DESCRIPTION 

OF INITIAL RESEARCH ACTIVITIES 

Nonnn 6. Peterson 



TASK 2: APPROACH AND RESEARCH DESIGN 



As described in the Overview, Project A is organized into five re- 
search tasks, and activities of Task 2 are the focus of this report. Task 
2's specific objective is the development and validation of new (or im- 
proved) selection and classification measures. 

At present, the U.S. Arny has a Urge number of jobs (called Military 
Occupational Specialties or MOS) and hires, almost exclusively, inexperi- 
enced and untrained persons to fill those jobs. As obvious as these facts 
are, they need to be stated because they are the overriding facts that have 
to be addressed by Task 2 research. 

. , One implication of these facts is that a highly varied set of indivi- 
dual differences* variables must be put into use if there is to be a 
reasonable chance of improving the present level of accuracy of predicting 
training performance, job performance, and attrition/retention in a sub- 
stantial proportion, if not all, of those jobs. Much less evident is the 
particular content of that set of individual differences variables, and the 
way the set should be developed and organized. 

A second, and perhaps less obvious, implication is the notion that new 
predictor measures must be appropriate for selecting persons who do not 
?«?J5S training and experience to i«wdiate^y begin performing their 
assigned jobs. This is true partly because of the vast numbers of job 
positions that need to be filled, partly because of the kinds of jobs found 
in the Arny (infantry artillery, etc.), and partly because of the popula- 
tion of persons that the Army draws from (young high-school graduates with 
little or no specialized training and job experience). 

Theoretical AoDroach 

These considerations led us to adopt a construct-oriented strategy of 
predictor development, but with a healthy leavening from the content- 
oriented strategy. Essentially, we endeavored to build up a model of 

Elfil'^ir^JSic^ S^^i li'^VlVJl^ ^''^ relatively independent do- 

mains or types of individual differences* constructs that existed; (2) 
selecting measures of constructs within each domain that met « number of 
psychometric and pragmatic criteria; and (3) further selecting those con- 
niSfJj«iJ?\;'"'^iIS? J? the "best bets- for incrementing (over present 
predictors) the prediction of the set of criteria of concert i.e., train- 
ing/job performance and attrition/retention in Army jobs). 

^l^tlVll we hoped, lead to the selection of a finite 

set of relatively independent predictor constructs that were also rela- 
tively independent of present predictors and maximally related to the 
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criteria of interest. If these conMiUiis were met, then the resulting set 

■ " f il.- 1~ ..... »»..q|if 




.» ....w V ^ 

jobs. l':iiHvrfjA mn^r-ii .p.niyii 

The development of such ff^if^tfeY'^aTSo Ma-^he virtue that it could be 
at least partially "tested" at many points during the research effort, and 
not just at the endj^ whqa fjy^the pr,pdictor and crite^^ data are in. For 
£Aaiaple, we could effinfe' tK#'^6v'^rV^r^ciWHi«ly developed measures with 
one another and with the present predictors, notably the Armed Services 
Vocational Aptitude Batterv (ASYAS).. If the. new measures were not rela- 
tivSrJlkiefifck 

dfmi biWfm6m^'tm-^Q ^§uia^€ab^ltepf t6^6bfHtfnfi^t. ' I^Vs^, By" 
provements could be implemented'^ffldcfiw^ straigntforwardly: - ■■ ■ — ■ 





...Jlm'^x^ m^^ 4si a8-,^,matrjx;.wi|h |jhier^r5^ic^L^p,,o , 

erapP:t'hfs^Msra|ciUal^ 



appr^riafe levels of fB6ciftc]ijy-^or a.jas.^iciij^r fHr0N.eHV.ja.S2 we dp^ tU • 
research, or for future applications^of measures. (See Peterson and 
Bownas, 1982, for further discussion of this type of mo^Hi-. ...i , 
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This^theoreypai' approach ied'to. the. delineation of seven, more con- 
crete objeci'iyeso'f our research, ^ These were: - 

. - i . 'Jdentl ^y^wea§Mres^ P^'' bui^Br abi 5 j ti es » ^atfeW tes , pr.^ cHarac t;eriS2 
■■Mm whiefj=.;ab'«osi '11kely t.O:beseffectivr In. predicting,- pri. or to 
■ . r 'rentirynpitpi ithe^Arwy, -successful soldier performance .:in ,9eaer^l^,and, 
- , inrclisl-ify^ng p§r§pf)s ioto'.W: where t<hey wvlli- be: nwSjt .succe^sfiil , 
wi th specj ?1 .^emphas.i.s .onr attributes not jt.apped, by iCi'^-rerit- prer; 
enl istment "measures. 

■ZleQesign andwdwelop new measures ^or^mpdifyj.ex^st^ng measures^ or 1-.,' 
9H.these;:)?be$trbet". predictors i . m 
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PREDICTORS 



Cognitive 



Verbal 
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Training 
Performance 



Pass/ Test Att:en- 
Fan Grades dancp 



• Job Task V 
.Efirfermanss 



Attrition/ 



H 



Common. Specific., finish Reen- Early 
-JtaSJsi Tasks' Term- list Dlscharoe 



K 
H 



Precision 
Psychomotor Coordination 
Dexterity 



Temperament 



Dependability 
Dominance 
Sociability 



Interests 



Realistic 
Artistic 
Social 



^i^enotes expected strength of relationship, High, Medium, Low. 
Figure l.l. Illustrative construct-oriented model. 
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3. Develop materials and procedures for efficiently administering 
experimental predictor measures in the field. 

4. Estimate and evaluate the reliability of the new pre-enlistment mea- 
sures and their vulnerability to motivational set differences, 
faking, variances in administrative settings, and practice effects. 

5. Determine the interrelationships (or covariance) between the new 
pre-enlistment measures and current pre-enlistment measures. 

6. Determine the degree to which the validity of new pre-enlistment 
measures generalizes across HQS, that is, proves useful for pre- 
dicting measures of successful soldier performance across quite 
different MOS ami, conversely, the degree to which the measures are 
useful for classification or the differential prediction or success 
across MOS. 

7. Determine the extent to which new pre-enlistment measures increase 
the accuracy of prediction of success and the accuracy of classifi- 
cation into MOS over and above the levels of accuracy reached by 
current pre-enlistment measures. 



Research Design 

To achieve these objectives, we have followed the design depicted in 
Figure 1.2. There are 15 subtasks in our actual research plan, each tied 
to one or more of the activities or products shown in Figure 1.2. 

Several things, we feel, are noteworthy about the design. First, five 
test batteries are mentioned: Preliminary Battery, Demo Computer Battery, 
Pilot Trial Battery, Trial Battery, and Experimental Battery. These appear 
successively in time and allow us to modify and improve our predictors as 
we gather and analyze data on each successive battery or set of measures. 

Second, a large-scale literature review and a quantified expert judg- 
ment process were utilized early in the project to take maximum advantage 
of earlier research and accumulated knowledge and expert opinion. The 
expert Judgment process was used to develop an early model of both the 
predictor space and the criterion space, and relied heavily on the informa- 
tion gained from the literature review. By using the model that resulted 
from analyses of the experts' judgments of the relationships between pre- 
dictor constructs and criterion dimensions, we were able to develop, care- 
fully and efficiently, measures of the most promising predictor constructs. 

Third, the design includes both predictive (for the Preliminary and 
Experimental Batteries) and concurrent (for the Trial Battery) validation 
modes of data collection, although that is not obvious from Figure 1.2. 
Thus, we are' able to benefit from the advantage of both types of designs,— 
that is, early collection and analysis of empirical criterion-related 
validities in the case of the concurrent design, and less concern about 
range restriction and experiential effects in the predictive design. 
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Figure 1.2. Flow chart cf predictor measure development activities of 
Project A. 
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Organization 

We organized Task 2 researchers Into three "domain teams" as we worked 
our way through this research design and toward the earlier described 
research objectives. 0ns team concerned Itself with the tsmpsramentr 
biographical data, and vocational Interest variables and came to be called 
the "non-cognitive" team. Another team concerned Itself with cognitive and 
perceptual kinds of variables and was called the "cognitive" team. The 
third team concerned itself with psychomotor and perceptual variables and 
was labeled the "psychomotor" team or sometimes the "computerized" team, 
since all the measures developed by that team were computer- administered. 
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TASK 2: PROGRESS SUMMARY 



One gauge of progress is the degree to which the seven research objec- 
tives presented earlier have been accomplished. Following is a short 
sunsnary of progress in terms of those objectives. 

1. Identify "best bet" measures --This objective has been met. We 
sifted through a mountain of literature, translating the informa- 
tion onto a common form that enabled us to evaluate constructs and 
measures in terras of several psychomotor and pragmatic criteria. 
The results of that effort fed into the expert judgment process 
wherein 35 personnel psychologists provided the data necessary to 
develop our first model of the predictor space. After further 
review by experienced researchers in the Army and ah advisory 
group, a set of "best bet" constructs was settled on. We also made 
some field visits to observe combat arms jobs first-hand, in addi- 
tion to receiving criterion- side information from other Project A 
researchers; all of this information was very useful in developing 
new measures. 

2. Develop measures of "best bet" predictors- -This objective was ac- 
complished by following the blueprint provided from the first 
objective. We carried out many small and not-so-small sample 
tryouts. of these measures as they were developed, as is documented 
in the remainder of this report. The Trial Battery is the tangible 
product of meeting this objective. 

3. Develop procedures for efficiently administering predictor mea- 
sures—As anyone who has done research in military settings is 
aware, soldiers* time is precious and awarded research time is not 
to be squandered. We think we have developed and implemented 
effective methods for getting maximum quality and quantity of data 

. out of our data collection efforts. The favorable results we have 
so far achieved in completeness and usefulness of data are due in 
large part, we think, to the attention paid to this objective. 

4. Estimate reliability and vulnerability of measures--This objective 
has also been largely accomplished. We can report that analyses to 
date indicate that the new measures are psychometrically sound and 
acceptably invulnerable to the various sources of measurement prob- 
lems- -or we have devised some ways to adjust for such effects. 
However, more specifically targeted research would be useful in 
this area. 

5. Determine the interrelationships between the new measures and cur- 
rent pre-enlistment measures- -Work still remains on this objective, 
but the data collected to date show that the new measures have much 
variance that is not shared with the ASVAB, and that the across- 
domain shared variance is low (e.g., the new cognitive measures 
have low correlations with the non-cognitive measures). 

6. and 7. Determine the level of prediction of soldier perforriiance, 

classification efficiency, and incremental validity of the new mea- 
sures--The jury is still out on these questions since the data that 
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w1!1 enable us to address these objectives have not yet been ana- 
lyzed. 

He turn now to a description of the Initial research activities de- 
voted to development of new predictors, speclflcslly: literature review; 
expert Judgnents; developaent, adalnl strati on, and analysis of the Prelia- 
Inary Battery; and initial development of a computer battery. As Figure 
1.2 shows, all of these activities led up to the development of the Pilot 
Trial Battery. 
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LITERATURE REVIEW 

Esams. 

The overriding purpose of the literature review was. siraolv out to 
■ake naximum use of earlier research on the problem of accurately ore- 
Jifi classifying persons into jobs in such S way 

person and the organization receive maxiMum benefits. More 
specifically, we wished to identify those variables or constructs and 
their measures, that had proven effective for such purp5?« As FiaSre 1 2 
ft?*!* I"fo"nation obtained from the literature'^^eSilS ^as used ?J a 1 
the immediately succeeding research activities. 

Search Proceri^|y»g 

J'' ^'^f conducted by the three research teams, each responsible 
for a fairly broadly defined area of human abilities or characteristics- 
J ''h?o^^^*l!'S"S":^°9nitive characteristics such as loSfona "n- 

ihls Jal "^UM f °^ temperament; and psyJhSmoti?/- 

pnysicai abilities. While these domains were convenient for Duroose? of 
organizing and conducting literature search activities, thel SeJe not uLd 
as (nor intended to be) a final taxonomy of possIbirprediaormeasSJes 

The literature search was conducted in late 1982 and early 1983. In 
each of the three areas, the teams carried out essentially the same steps: 

boTkr.%?"o?i;e;"sJi;c%s!'' °' '''''''' ^^^^^^^^^ 

2. Review each source and determine its relevancy for the project bv 
examining the title and abstract (or other brief review) 

3. Obtain the sources identified as relevant in the second step. 

rnLi;!^r?"* materials, carry out a thorough review and transfer 
project SP«<^^«^ »'eview forms developed for the 

In the first step, several activities were designed to insure as 

S'S5?;'i^c/ ^^'^ 5' computerized searches of Rele- 

vant data bases were done; Appendix A names and describes the data bases 

S 5?;h ^^^^^^^ ^'^^ thin sou??es vilrl 

^^I'Pf ""2"^ of these sourcef were 

thaS JJJeV counted more 

In addition to the computerized searches, we obtained reference lists 

t;°7hI'J?2!H"VP?'^' emphasizing the mJs? feJint "sJI?Jh 

in the field. We also obtained several annotated bibliographies from 

;Ilr^''L!'ff''''V'°'''*°i:^!'- '^^"^lly' "^''"ed the last seveJST 
i^mv ail °"fc°^ iT""'^ journals that are frequently used in each 
aoiiity area, as well as more general sources such as textbooks handbooks 

•pSychSlSgJ) '"'"''^ ^" ' conceptuallj distlict areas of 



The vast majority of the sources identified were not relevant to our 
purpose~that is, the identification and development of promising measures 
for personnel selection in the U.S. Army. These nonrelevant sources were 
weeded out in Step 2, The relevant sources were obtained and reviev;ed9 and 
two forms were completed for each source: an Article Review form and a 
Predictor Review form (several of the latter could be completed for each 
source). These forms were designed to capture, in a standard format , the 
essential information about the reviewed sources, which varied considerably 
in their organization and reporting styles. 

The Article Review fom contained seen sections: citation, abstract, 
list of predictors (keyed to the Predictor Review forms), description of 
criterion measures, description of sample(s), description of methodology, 
other results, and reviewer's comments. The Predictor Review form also 
contained seven sections: description of predictor, reliability, norms/ 
descriptive statistics, correlations with other predictors, correlations 
with criteria, adverse impact/differential validity/test fairness, and 
reviewer's recommendations (about the usefulness of the predictor). Each 
predictor was tentatively classified Into an initial working taxonomy of 
predictor constructs (based primarily on the taxonomy described In Peterson 
and Bownas, 1982). Appendix B contains copies of these two forms. 

Literature Search Results 

The Review forms and the actual sources that had been located were 
used in two primary ways. First, three working documents were written, one 
for each of the three areas. (These work documents were put Into ARI 
Research Note form: Toquam, Corpe, Dunnette and Keyes, in preparation; 
McHenry and Rose, in preparation; Hough, Kampe, and Barge, in preparation*) 
These documents identified and summarized the literature with regard to 
issues important to the research being conducted, the most appropriate 
organization or taxonomy of the constructs in each area, and the validities 
of the various measures for different types of job performance criteria. 
Second, the predictors identified in the review were subjected to further, 
structured scrutiny in order to (1) select tests, and inventories to make 
up the Preliminary Battery, and (2) select the '•best bet" predictor 
constructs to be used in the expert judgment research activity. 

Screening of Predictors 

An initial list was compiled of all predictor measures that seemed 
even remotely appropriate for Amty selection and classification. This list 

was further screened by eliminating measures according to several "knock- 
out factors: (1) measures developed for a single research project only; 
(2) measures designed for a narrowly specified population/occupational 
group (e.g., pharmacy students); (3) measures targeted toward younger age 

roups; (4) measures requiring special apparatus for administration; 

5) measures requiring unusually long testing times; (6) measures requiring 
difficult or subjective scoring; and (7) measures requiring individual 
administration. 

Knockout factor (4) was applicable only with regard to screening for 
the Preliminary Battery, which could not have any computerized tests or 
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other apparatus since it was to be administered early in the project, 
before such testing devices could be developed. Factor (4) was nfit applied 
with regard to screening measures for inclusion in the expert judgment 
process. 

Application of knockout factors resulted in a second list of candidate 
measures. Each of these measures was evaluated on the 12 factors shown in 
Figure 1.3, by at least two researchers. (A 5-point rating scale was 
applied to each of the 12 factors.) Discrepancies in ratings were resolved 
by discussion. We point out that there was not always sufficient informa- 
tion for a variable to allow a rating on all factors. 

This second list of measures, each with a set of evaluations, was 
ir ut to (1) the final selection of measures for the Preliminary Battery 
and (2) the final selection of constructs to be included in the exoert 
Judgment process, to which we now turn. 
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1. OlscrlnlnabUlty * txttut to which the masure his sufficient score 
range and variance, I.e., does not suffer from ceiling and floor 
effects with respect to the applicant population. 

2. Reliability - degree of reliability as neasured by traditional psycho* 
metric methods such as test-retest, Internal conslstancy, or parallel 
forms reliability. 

3« Group Score Differences IDIfferentlal Impact) - extent to which there 
are mean and variance differences in scores across groups defined by 
age, sex, race, or e^.hntc groups; a high score indicates little or no 
mean differences across these groups. 

4. Consistency/Robustness of Administration and Scoring - extent tc which 
administration and scoring Is standardized, ease of administration and 
scoring, consistency of administration and scoring across administra-* 
tors and locations. 

5. Generality - extent to which predictor measures a fairly general or 
broad ability or construct. 

6. Crlterlon*Related Validity - the level of correlation of the predlctc.^ 
as a measure of Job performance, training performance and turnover/at* 
tritlon. 

7. Construct Validity * the amount of evidence existing to support the 
predictor as a measure of a distinct construct (correlational studies, 
experimental studies, etc.). 

8. face Validity/Applicant Acceptance * extent to which the appearance 
and administration methods of the predictor enhance or detract from 
Its plausibility or acceptability to laymen as an appropriate test for 
ISe Army. 

9. Differential Validity - existence of significantly different 
criterion-related validity coefficients between groups of legal or 
societal concern (race, sex, age); a high score indicates little or 
no differences In validity for these groups. 

10. Test Fairness • degree to which rlopes, Intercepts, and standard 
erronc of estimate differ^^across groups of legal or societal concern 
(race, sex, age) when predictor scenes are reoressed on Important 
criteria (Job performance, turnover, training); a high score indicates 
fairness (lUtte or no differences in slopes, ir.tercepts, and standard 
,errors of estimate). 

11. Usefulness of Classification • extent to which the measure or predic 
tor win be useful In classifying persons Into different specialties. 

12. Over' 11 Usefulness for Predicting Army Criteria * extent to which 
predictor Is likely to contribute to the overall or individual predic* 
tlon of criteria Important to the Army (e.g., AWOL, drug use, attri- 
tion, unsultabll Ity, Job performance, and training). 



Figure 1.3. Factors used to evaluate predictor measures for the Preliminary Battery. 
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EXPERT JUDGMENTS 



Approach and Rationale 

The approach used in the expert judgment process was to (1) identify 
criterion categories, (2) identify an exhaustive range of psychological 
constructs that may be potentially valid predictors of those criterion 
categories, and (3) obtain expert judgments about the relationships between 
the two. Schmidt, Hunter, Croll, and McKenzie (1983) showed that pooled 
expert judgments, obtained from experienced personnel psychologists, were 
as accurate in estimating the validity of tests as actual, empirical cri- 
terion-related validity research using samples of hundreds of subjects. 
That is, experienced personnel psychologists are effective "validity gener- 
al izers" for cognitive tests. They do tend to underestimate slightly the 
true validity as obtained from empirical research. 

Hence, one way to identify the "best best" set of predictor variables 
and measures is to use a formal judgment process employing experts such a' 
that followed by Schmidt et al. (1983). Peterson and Bownas (1982) provide 
a complete description of the methodology, which has been used successfully 
by Bownas and Heckman (1976), Peterson, Houston, Bosshardt, and Dunnette 
(1977), Peterson and Houston (1980), and Peterson, Houston, and Rosse 
(1984) to identify predictors for the jobs of firefighter, correctional 
officer, and entry-level occupations (clerical and technical), respec- 
tively. Descriptive information about a set of predictors and the job 
performance criterion variables is given to "experts" in personnel selec- 
tion and classification, typically personnel psychologists. These experts 
estimate the relationships between predictor and criterion variables by 
rating or directly estimating the value of the correlation coefficients. 

The result is a matrix with predictor and criterion variables as the 
columns and rows, respectively. Cell entries are experts* estimates of the 
degree of relationship between the particular predictors and various cri- 
teria. The interrater reliability of the experts* estimates is checked 
first. If the estimate is sufficiently reliable (previous research shows 
valMfis in the .80 to .90 range for about 10 to 12 experts), the matrix of 
predictor-criterion relationships can be anali ^d and used in a variety of 
ways. By correlating the columns of the matrix, the covariances of the 
predictors can be estimated on the basis of the profiles of their estimated 
relationships with the criteria. These covariances can then be factor 
analyzed to identify predictors that function similarly in predicting 
performance criteria. Similarly, the criterion covariances can be examined 
to identify clusters of criteria predicted by a common set of predictors. 

Such procedures help identify redundancies and overlap in the predic- 
tor set. The common sets or clusters of predictors and of criteria are an 
important product for several reasons. First, they provide an efficient 
means of summarizing the data generated by the experts. Second, the sum- 
mary form allows easier comparison with the results of meta-analyses of 
criterion-related validity coefficients. Conflicting or absent evidence is 
a sure guide to important research questions. Certain clusters may have to 
be reconfigured because of new data. Third, less direct but potentially 
more important, these clusters provide a model or theory of predictor- 
criterion performance space. This model serves as an informative guide to 
development of a se. .f predictors that should be efficient !ind valid, at 
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least Insofar as the informed opinion of knowledgeable experts can 
one In that direction. 



To carry out the expert judgment a( ,1vity, we had to Identify predic- 
tor and criterion variables and prepare materials that would enable the 
experts to provide rr.llable estimates of validity. 

Identification of Predic tor Variables 

The list of predictor variables that had been evaluated on 12 relevant 
fact'^rs (see Literature Review, Screening of Predictors) was used to iden- 
tify the predictors for the expert judgment process. Variables were in- 
cluded if they received generally high evaluations and if they added to the 
comprehensiveness of coverage for a particular domain of predictor vari- 
ables. At this point, we began to depart somewhat from the Initial predic- 
tor taxononyr used In the literature review, and to create a new one that we 
felt best represented the entire predictor domain relevant to our Army 
goal. There were 53 members In the final set of predictor variables. (The 
names and definitions of these variables are shown in Appendix C.) 

Materials describing each of the 53 variables were prepared. The 
expert judges were experienced psychologists who were generally familiar 
with psychometric information and, in varying degrees, knowledgeable about 
the 53 variables in our final list. Therefore, the descriptive material 
was designed to transmit a large amount of Information as concisely as 
possible. 

Each packet contained a sheet that named and defined the variable, de- 
scribed how it was typically measured, and summarized the reliability and 
validity of the selected measures of the variable. Following this sheet 
were descriptions of one or more specific measures, including the name of 
the test, its publisher, the variable it was designed to measure, a de- 
scription of the items and the number of items on the test (in most cases, 
sample items were Included), a brief description of the administration and 
scoring of the test, and brief summaries of 'tudies of the reliability and 
validity of the measure. 

Identification of Criter ion Variables 

Several types of criterion variables were identified. They Included a 
set of specific job task criterion categories, a set that described perfor- 
mance In initial Army Training, and a set of generalized Amw effectiveness 
categories. 

Specific Job Task Cateoorigs. Short of enumerating all job tasks in 
the nearly 240 entry-ljjvel job specialties, the nature of the performance 
domain had to be characterized in a way that was at once comprehensive, 
understandable, and usable by judges. Since many jobs share similar tasks, 
the abstraction of generic task categories was possible. Two approaches 
were tried; we report here only on the method choson. 

Thif approach was based on more general job descriptions of a repre- 
sentative sample of. Ill jobs tha* had been previously clustered by person- 
nel experts familiar with Army joDs. Twenty-three clusters had been iden- 
tified. Criterion categories were developed by reviewing the descriptions 
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of the jobs in these clusters to determine comnon job activities. Emphasis 
was placed on determining what a soldier in each job might be observed 
doing and what he or she might be trying to accomplish. The categories 
were constructed to connote a set of actions that typically occur together 
{e.g., transcribe, annotate, sort, index, file, retrieve) leading to soaie 
common objective {e.g., record and file information). Criterion categories 
often included reference to the use of equipment or other objects. 

Once criterion categories were identified for the common actions in 
the 23 clusters, additional categories were identified to cover unique 
aspects of jobs in the sample of 111. In all, 53 categories were gen- 
erated. Most of the categories applied to several jobs, and most of the 
jobs were characterized by activities from several categories. Their names 
and definitions are shown in Appendix C. 

^ Performance in Initial Armv Training. Two sources of information were 
used to Identify appropriate training performance variables: archival 
records of soldiers' performance in training were examined, and trainers 
were interviewed. This information was obtained for eight MOS: Radio/Te- 
letype Operator, MANPADS Crewman, Light Vehicle/Power Generation Mechanic, 
Motor Transport Operator, Food Service Specialise, M60 and Ml Armor Crew, 
Administrative Specialist, and Unit Supply Specialist. These specialties 
represented a heterogeneous group with respect to type of wo»'k and were, 
for the most part, high-density MOS. 

The review of archival rfecords was intended to identify the type of 
measures used to evaluate training performance, since the content was, ob- 
viously, specific to each MOS. 

Five or six trainers were iJiterviewed for each MOS, using a modified 
critical incidents approach. Trainers were asked, "What things do trainees 
do that tell you th?y are gocd (or bad) trainees?" Generally, trainers re- 
sponded wv falirly broad, trait-like answers and appropriate follow-up 
questionr used obtain more specific, behaviorally oriented informa- 
tion. 

After examining the archives and conducting the interviews, we pooled 
and categorized the information from both sources. Ue found much overlap 
across MOS in the way training performance was evaluated. Furthermore, we 
could not include content-specific variables since this would require 
several hundred training pei^formance variables {one for each MOS, at 
least). Nor did we wish U do so, since the task or MOS-specific perfor- 
'mance variance was covered elsewhere, as described above. 

In the end, we decided that four variables adequately represented 
training performance. Their names and definitions are shown in Appendix C. 

Generalized Armv Effectiveness Categories. The identification of 
these variables was carried out in three steps. First, we developed a 
preliminary conceptual model based on relevant theory and empirical 
findings. Second, empirical research using the inductive behavioral analy- 
sis method was carried out to verify and modify the preliminary model. 
Finally, several criterion variables that are common across all MOS but are 
not behavioral in ncture were added to the final list. We briefly sum- 
marize those steps here; a more complete description can be found in a 
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paper by Borman, Hotowldio, and Hanser (1983). 

The preliminary model revolved around three concepts: organizational 
cosanitment, organizational socialization, and morale. Each of these was 
thought to contribute to generalized Army effectiveness. Consideration of 
theory and research In these areas led to the Identification and definition 
of 15 general Army effectiveness dimensions. 

Behavioral analysis workshops were employed in order to verify and 
extend this nodel. Persons knowledgeable about a Job were asked to gen- 
erate behavioral examples of effective and Ineffective performance in all 
aspects of the Job. Army NCOs and officers generated several hundred 
examples, which were then content analyzed by Project A staff. The re- 
sulting categories were compared to the dimensions in the preliminary 
nodel. There was considerable overlap, but some modifications were made to 
the model dimensions. Nine general effectiveness behavioral dimensions 
were named and defined; these are shown in Appendix C. 

In the final step, six more criterion variables indicating general 
effectiveness were added; they are also named and defined in Appendix C. 
The first two, " Survive in the field" and "Maintain physical fitness," 
were added bepause they are expected of all soldiers but did not emerge 
elsewhere. The last four are all Important "outcome" criterion variables. 
That is, they represent outcomes of individual behavior that have negative 
or positive value to the Army, but the outcomes could occur because of a 
variety of iiidividual behaviors. 

In all, then, 72 criterion variables were Identified and defined for 
use in the expert Judgment task. 

Subjects 

The experts who served as Judges were 35 Industrial, measurement, or 
differential psychologists with experience and knowledge in personnel se- 
lection research and/or applications. Each expert was an employee of or 
consultant to one of the four organizations involved in Project A: U.S. 
Anfly Research Institute, Personnel Decisions Research Institute, Human 
Resources Research Organization, American Institutes for Research. Not all 
of the employees were directly Involved with Project A although all of the 
consultants were. 

Instruction s and Prort»dures 

Detailed Instructions were provided for each judge along with the 
materials describing the predictor and criterion variables. Information 
was provided on the concept of "true validity," criterion- related validity 
corrected for such artifacts as rangs restriction and unreliability, and 
unaffected by variation in sample sizes. Judges were asked to estimate the 
level of true validity rather than estimated validity, on a 9-point scale. 
A rating of "1" meant a true validity in the range of .00 to .10; "2", .11 
to .20; and so forth, to "9", .81 to .90. 

Descriptions of the 53 predictor variables had been divided into three 
groups (A, B, and C, two groups of 18 and one of 17). The 72 criterion 
descriptions were in one group. The judges were encouraged to skim the 
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■iteriils for a few predictors and for all the criteria before 
the rating task. 



Each judge then estiaated the validity of each predictor for each cri- 
terion. The order of the predictor groups (A, B, C) was counterbalanced 
across Judges, with about one-third of the 35 judges beginning with Group A 
(Predictors 1-18), another one-third with Group B (Predictors 19-36). and 
the rest with Group C (Predictors 37-53). ' 

Ratings were aade on separate Judgaent Record Sheets. Before making 
any Judgments about a predictor, the expert was to read the description and 
review the examples given to measure it; Judgments were to be made about 
the predictor as a construct, not about the variable as measured by any 
specific instrument. The judge was then to read the description of the 
first criterion and to estimate the validity of that predictor for that 
criterion. Judgments could be either positive or negative; positive signs 
were not to be entered. The judge was then to read the description of the 
second criterion and rate the validity of the same predictor for that 
criterion. The judge was to estimate the validities of the first predictor 
variable for all 72 criteria before moving to the next predictor. 

All judges completed the task during the first week of October 1983. 

A number of analyses were carried out: reliability of the judgments, 
means and standard deviations of the estimated validities within each 
predictor/criterion cell and for various marginal values, and factor an- 
alyses of the predictors (based on their validity profiles across the 
criteria) and the criteria (based on their validity profiles across the 
predictors). 

The estimated validities were highly reliable when averages were used. 
The reliability of the mean estimated cell validities was .96. The factor 
analyses were based on these cell means. Th«i most pertinent analysis for 
purposes of this report concerns the factor analysis of the predictors. 

Factor solutions with two through 24 factors were calculated. The 
nine-factor solution was selected as most meaningful. Eight of the nine 
factors were interpretable; one was not interpreted. The eight interpret- 
able factors were named: Cognitive Abilities, Visualization/Spatial, In- 
.formation Processing, Mechanical, Psychomotor, Social Skills, Vigor. Moti- 
vation/Stability. 

These eight factors appeared to be composed of 21 clusters, based on 
the profile of loadings of each predictor variable across all the factors. 
This hierarchical structure of the predictor variables is shown in 
Figure 1.4. Inspection of the profiles clarifies the meanings both of the 
factors and of the clusters, as follows. 

The eight predictor factors divide the predictor domain into reasonable- 
appearing parts. The first five refer to abilities and skills in the 
cognitive, perceptual, and psychomotor areas while the last three refer to 
traits or predispositions, in the noncognitive area. Host of the represen- 
tative measures of the constructs defining the first five factors are of 
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■axinal perfonaance while most of the representative measures of the last 
three factors are of typical performance, with the exception of the in- 
terest variables. 

The first four factors, which Include 11 clusters of 29 predictor con- 
structs or variables, are cognitive-perceptual in nature. The first fac- 
tor, labeled "Cognitive Abilities," includes seven clusters, five of which 
appear to consist of more traditional mental test variables: Verbal Abil- 
1ty/6eneral Intelligence, Reasoning, Number Ability, Memory, Closure. The 
Perceptual Speed and Accuracy cluster is linked to measures having a long 
history of inclusion in traditional mental tests. The seventh cluster. 
Investigative Interests, refers to no cognitive test at all but does tap 
interest in things intellectual, the abilities for which are evaluated in 
this factor. 

u * second factor, V<sual1zation/Spatial, consists of only one cluster 
but includes six constructs which have some history of assessment of spa- 
tial ability. Two of the clusters from the Cognitive Abilities factor. 
Reasoning and Closure, have some affinity to this second factor, as may be 
seen in the factor analysis data. This may be due to the tasks used to 
illustrate the assessment of the constructs, which are to solve problems of 
a visual and nonverbal nature. The third factor. Information Processing, 
also consists of only one cluster, with the three constructs referring more 
directly to cognitive-perceptual functioning rather than accumulated knowl- 
edge and/or structure. 

The fourth factor. Mechanical, includes two clusters, one of which 
consists only of the construct of Mechanical Comprehension while the other 
is, again, an interest cluster consisting of a positive loading for Realis- 
tic Interests and negative loading for Artistic Interests. 

The fifth factor. Psychomotor, consists of three clusters which in- 
clude the nine psychomotor constructs. The first cluster, Steadiness/Pre- 
cision, refers to aiming and tracking tasks, where the target may move 
steadily or erratically. The second cluster. Coordination, indexp'? the 
large-scale complexity of the response required in a psychomotor vask while 
the third factor. Dexterity, appears to index the small-scale complexity of 
I'^sponses. 

The remaining thrpe factors, noncognitive in character, refer more to 
interpersonal activities. The Social Skills factor consists of two clus- 
ters. The first. Sociability, refers to a general interest in people while 
the second. Enterprising Interests, refers to a more specific interest in 
working successfully with people. The seventh factor is called "Vigor" as 
it includes two clusters that both refer to general activity level. The 
first. Athletic Abilities/ Energy, includes two constructs which point to- 
wards a physical perspective while the second cluster, Domi nance/Self - 
Esteem, points toward a psychological perspective. 

The eighth and last factor, Motivation/Stability, includes three clus- 
ters or facets. The first. Traditional Values, includes both temperament 
measures and interest scales, and refers to being rule-abiding and a good 
citizen. The second cluster. Work Orientation, refers to temperament 
measures which index attitudes towards the individual vis-a-vis his/her 
efforts in the world. The third cluster, Cooperation/Stability, appears to 
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refer to skill in getting along with people, including getting along with 
oneself in a healthy manner. 

The expert judgment task resulted in a hierarchical model of predictor 
space that served as a guide for the developrnent of neW) pre-enlistment 
measures (the Pilot Trial Battery, see Figure 1.2) for Army enlisted ranks. 
(Wing, Peterson, and Hoffman, 1984, provide a detailed presentation of the 
expert Judgment process and results.) This model was not the only set of 
information that guided the development of the Pilot Trial Battery, how- 
ever. We turn now to the other major source of guidance, the development, 
administration, and initial analyses of the Preliminary Battery. 
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PRELIMINARY BATTERY 



The Preliminary Battery (PB) was concgived of as a set of proven "off- 
the-shelf" measures of predictors that overlapped very little with the 
Army's current pre-enl istment predictors. There were two primary reasons 
for developing and administering a Preliminary Battery. Fifst, the collec- 
tion of data on a number of predictors that represent the types of predic- 
tors not currently in use by the Army would allow an early determination of 
the extent to which such predictors contributed unique variance, that is, 
measured attributes not measured by current pre-enl istment predictors. This 
Inforraation would be useful for guiding the development of new predictors 
into areas most likely to be useful for increasing the accuracy of predic- 
tion and classification. 

Second, the collection of predictor data (from soldiers in training) 
early in the project allowed the conduct of a predictive validity inves- 
tigation much earlier in the project than if we were to wait until the 
Trial Battery was developed (see Figure 1.2). Thus, the extent to which 
the different (from ASVAB) constructs represented in the Preliminary Bat- 
tery added to the prediction of training success and effectiveness of job 
performance could be ascertained via a predictive design approximately 18 
months and 36 months after Project A began, rather than many months later 
than that. 

Selection of Prelimi narv Battery Measures 

As described earlier, the literature review identified a large set of 
predictor measures, each with ratings by the researchers on 12 psychometric 
and substantive evaluation factors (see Figure 1.3). These ratings were 
used to select a smaller set of measures as serious candidates for inclu- 
sion in the Preliminary Battery. Two major practical constraints came into 
playr (1) no apparatus or individualized testing methods could be used 
because of the relatively short time available to prepare for battery 
administration, and the fact that the battery would be administered to a 
large number of soldiers (several thousand^ over a 9-month period by rela- 
tively unsophisticated test administrators, and (2) only 4 hours were 
available for testing. 

Task 2 researchers made an initial selection of "off-the-shelf" mea- 
sures, but there were still too many measures for the time available. The 
tentative list was referred to the Army Research Institute scientists 
responsible for Task 2 specifically, and Project A generally, and to the 
Project A Director and Principal Investigator. The available information 
about each measure (construct measured, psychometric characteristics, type 
of job performance criteria it had predicted or was thought likely to 
predict) was presented and discussed. The set of measures selected was 
then reviewed by several consultants external to Project A, who had been 
retained for their expertise in various predictor domains. These experts 
made several "fine-tuning" suggestions. 

The Preliminary Battery included the following: 

0 Eight perceptual -cognitive measures 
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- Five from the Educational Testing Service (ETS) French Kit 
(Ekstrom, French, and Harman, 1976) 

- Two frois the Employee Aptitude Survey (EAS) (Ruch and Rucn, i960) 

- One from the Flanagan Industrial Tests (FIT) (Flanagan, 1965) 

0 Eighteen scales from the Air Force Vocational Interest Career Exam- 
ination (VOICE) (Alley and Matthews, 1982) 

0 Five temperament scales adapted from published scales 

- Two from the Differential Personality Questionnaire (DPQ) 
(Tellegen, 1982) 

- One from the California Psychological Inventory (CPI) (Gough, 
1975) 

- The Rotter I/E scale (Rotter, 1966) 

- Validity scales from both the DP^) and the Personality Research 
Form (PRF) (Jackson, 1967) 

0 Owen's Biographical Questionnaire (BQ) (Owens and Schoenfeldt, 
1979). The BQ could be scored for either 11 scales for males or 14 
for females, based on Owen's research, or for 18 predesignated, 
combined-sex scales developed for this research and called Rational 
Scales. The rational scales had no item on more than one scale; 
some of Owen's scales included items on more than one scale. Items 
tapping religious or socio-economic status were deleted from Owens' 
instrument for this use, and items tapping physical fitness and vo- 
cational-technical course work were added. 

Appendix 0 shows all the scale names and numbers of items for the 
Preliminary Battery. 

In addition to the Preliminary Battery, scores were available for the 
Armed Services Vocational Aptitude Battery, which all soldiers take prior 
to entry into service. ASVAB's ten subtests are named below, with the test 
acronym and number of Items in parentheses: 

Word Knowledge (WK:35), Paragraph Comprehension (PC: 15), 
Arithmetic Reasoning (AR:30), Numerical Operations {H0:50), 
General Science (GS:25), Mechanical Comprehension (MC:25), 
Math Knowledge (MK:25), Electronics Information (EI:20), 
Coding Speed (CS:84), Auto-Shop Information (AS:25). 

All but NO and CS are considered to be power tests; the two exceptions 
a^-e speeded. Prior research (in Kass, Mitchell, Grafton, & Wing, 1983) has 
shown the reliability of the subtests to be within expectable limits for 
cognitive tests of this length (i.e., .78-. 92). 
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Sample and Administrati on of Battery 



The Preliminary Battery was administered to soldiers entering Advanced 
Individual Training (AIT) for four MOS: 05C, Radio Teletype Operator (MOS 
code was later changed to 31C); 19 E/K, Armor Crewman; 63B, Vehicle and 
Generator Mechanic; and 71L, Administrative Specialist. Almost all sol- 
diers entering AIT for these MOS during the period 1 October, 1983 to 30 
June, 1984 completed the Preliminary Battery. We are here concerned only 
with the sample of soldiers who completed the battery from 1 October, 1983 
to 1 December, 1983, approximately 2,200 soldiers. 

The battery was administered at five training posts by civilian or 
military staff already employed on site. Task 2 staff traveled to these 
sites to deliver battery administration manuals and to train the persons 
who would administer the battery. A full day of training was provided, in- 
cluding a complete reading of the administration manual, role-playing 
practice in reading test and inventory instructions, completion of all 
tests and inventories by the administrators, and question-and-answer ses- 
sions about each chapter of the administration manual. Thereafter, Task 2 
staff contacted each post each waek by telephone to receive progress re- 
ports and answer questions. Administrators at posts also called Task 2 
staff whenever they had questions. The experience in training battery 
administrators and monitoring the administration over the nine-month period 
provided useful information for the data collection efforts involving the 
Pilot Trial Battery and Trial Battery. ^ 

* mJ^^ ^^^^ ***** Preliminary Battery was administered to a sample 
of 40 soldiers at Fort Leonard Wood prior to its implementation in order to 
test the instructions, timing, and other administration procedures. The 
results of this tryout were used to adjust the procedures, prepare the 
manual, and identify topics to be emphasized during administrator training. 

Analyses 

An initial set of analyses was performed on the Preliminary Battery 
data to inform the development of the Pilot Trial Battery (PTB). (The PTB 
was intended to include newly developed tests and inventories that would 
measure the important abilities and traits identified via the literature 
review and expert judgment process. These PTB measures would be piloted 
and field tested and then revised to become the Trial Battery. See 
Figure 1.2 for a flow chart showing the sequencing of the various bat- 
teries.) We summarize those findings here. They are more completely 
reported in Hough, Dunnette, Wing, Houston, and Peterson (1984). 

Three types of analyses were done. First, the psychometric charac- 
teristics of aach scale were explored to pinpoint possible problems with 
the measuras or the construct being measured, so those problems could be 
avoided when the Pilot Trail Battery measures were developed. These anal- 
yses Included descriptive statistics, item analyses (including numbers of 
Items attempted in the time allowed). Internal consistency reliability 
estimates, jnd, for the temperament inventory, percentage of subjects 
railing the scales intended to detect random or improbable response pat- 
terns. 

Secon(J, the covariances of the scales within and across the various 
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conceptual domains (i.e., cognitive, temperament, biographical data, and 
vocational Interest) were Investigated to detect excessive redundancy among 
the PB mea^w.*es, especially across the domains. If such redundancies were 
detected, then steps could be taken to avoid such a problem In the Pilot 
Trial Battery. Third, the covarlances of the PB scales with ASVAB measures 
were studied to Identify any PB constructs that showed excessive redundancy 
with ASVAB constructs-*aga1n, so that steps could be taken to alleviate 
suchljroblems for the Pilot Trial Battery. Correlation matrices and factor 
analyses were the major methods of analysis for these second and third 
purposes. 

The psychometric analyses showed some problems with the cognitive 
tests. The time limits appeared too stringent for several tests, and one 
test. Hidden Figures, appeared to be much too difficult for the population 
being tested. Since most of the cognitive tests used in the Preliminary 
Battery had been developed on college samples or other samples somewhat 
better educated than the population seeking entry into the Army, these 
findings were not unexpected. The lesson learned was that the Pilot Trial 
Battery measures needed to be accurately targeted (in difficulty of items 
and time limits) toward the population of persons seeking entry into the 
Army. No serious problems were unearthed for the temperament, blodata, 
and Interest scales. Item*total correlations were acceptably high and in 
accordance with prior findings, and score distributions were not exces* 
sively skewed or different from expectation. About 8% of subjects failed 
the scale that screened for inattentive or random responding on the temper* 
ament Inventory, a figure that is in accord with findings In other selec- 
tion research. 

Covar lance analyses showed that vocational Interest scales were rela* 
tively distinct from the biographical and temperament scales, but the 
latter two types of scales showed considerable covariance. Five factors 
were identified from the 40 non^cognitive scales, two that were primarily 
vocational interests and three that were combinations of biographical data 
and temperament scales. These findings led us to consider, for the Pilot 
Trial Battery, combining biographical and temperament item types to measure 
the constructs in these two areas. The five non*cogn1t1ve factors showed 
relative Independence from the cognitive PB tests, with the median absolute 
correlations of the scales within Bhch of the five factors with each of the 
eight PB cognitive tests ranging from .01 to .21. This confirmed our 
expectations of little or no overlap between the cognitive and non-cogni* 
tive constructs. 

Correlations and factor analysis of the teh ASVAB subtests and the 
eight PB cognitive tests confirmed prior analyses of the ASVAB (Kass, et 
al., Id83) and the relative Independence of the PB tests. Although some of 
the ASVAB-PB test correlations were fairly high (the highest was .57), most 
were less than .30 (49 of the 80 correlations were .30 or less, 65 were .40 
or less). The factor analysis (principal factors extraction, varimax 
rotation) of the 18 tests showed all eight PB cognitive tests loading 
highest on a single factor, with none of the ASVAB subtests loading highest 
on that factor. The non-cognitive scales overlapped very little with the 
four ASVAB factors identified in the factor analysis of the ASVAB subtests 
and PB cognitive tests. Median correlations of non-cognitive scales with 
the ASVAB factors, computed within the five non^cognitlve factors, ranged 
from .03 to .32, but 14 of the 20 median correlations were .10 or less. 

1-24 



fi3 



COMPUTER BAHERY DEVELOPMENT 



Roughly speaking, four phases of activities led up to the development 
of computerized predictor measures for the Pilot Trial Battery: (i) infor- 
mation gathering about past and current research in perceptual/psychomotor 
Measurement and computerized methods of testing such abilities; (2) con- 
struction of a demonstration computer battery, and a continuation of infor- 
Mation gathering; (3) selection of coirmercially available microprocessors 
and peripheral devices, writing of software for testing several abilities 
using this hardware, and try out of this hardware and software; (4) con- 
tinued development of software, and design and construction of a custom- 
made peripheral device, which we called a response pedestal. 

Background 

Compared to the paper-and-pencil measurement of cognitive abilities 
and the major non-cognitive variables (temperament, biographical data, and 
vocational Interests), the computerized measurement of psychomotor and 
perceptual abilities was in a relatively primitive state of knowledge. 
Much work had f-«n done in World War II using electro-mechanical apparatus, 
but relatively .ttle work had occurred since then. Microprocessor tech- 
nology held out the promise of revolutionizing measurement in this area. 

was (and still is) in its early stages. It was clear, how- 
ever, that cognitive ability testing was moving into a computer- assisted 
environment through th, ..ethodology of adaptive testing. As Project A 
began, work was under way to implement the ASVAB via computer-assisted 
testing methods in the Military Entrance Processing Stations. Therefore, 
it was also sensible from a practical point of view to Investigate these 
methods of testing. 

It was with this backdrop of relatively little research-based knowl- 
edge, excitement at the prospect of microprocessor-driven and, therefore, 
accurate and reliable testing, and the looming implementation of comput- 
erized testing in the military environment, that we began our work. 

Phase 1. Info rmation fiathgrlm 

The two major activities in this phase were literature review and 
visits to several military laboratories that were engaged in apparatus 
simulator, or microprocessor- driven testing of psychomotor and other abili- 
X 1 es • 

The literature review procedures were described earlier. Almost no 
literature was available on computerized, especially microprocessor- driven, 
testing of psychomotor/perceptual abilities for selection/classification 
purposes. Considerable literature was available on the taxonomy or struc- 
ture of such abilities, based primarily on work done in World War II or 
shortly thereafter. Work from this era showed that testing such abilities 
with electro-mechanical apparatus did show useful levels of validity for 

fS? ^°S!."/l''^''*fJ ^H* ^^^^ apparatus had reliability prob- 

lems. This information focused our attention on the types of abilities 
that would provide an efficient, yet comprehensive, coverage of this abil- 
ity domain, confirmed the notion that t^iting such abilities could yield 
useful validities, but emphasized the problems with unreliability in the 
use of electro-mechanical apparatus. y m ^.n^ 
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To obtain the most current Information, in the spring of 1983 we 
visited four military laboratories engaged in relevant research: the Air 
Force Human Resources Laboratory (AFHRL), Brooks Air Force Base; the Naval 
Aerospace Hedical Research Laboratory (NAfIRL), Pensacola Naval Air Station; 
the Army Research Institute Field Unit at Fort Rucker, Alabania; and the 
Army Research Institute Field Unit at For , Ki:ox, Kentucky, We were primar- 
ily after the answers to five questions: 

1. What computerized measures are in use? 

We found more than sixty different measures in use across the four 
sites, (Appendix E shows the names, location, and associated hard- 
ware/software for these measures •) A sizable number were special- 
ized simulators that were not relevant for Project A (e.g., a 
helicopter simulator weighing several tons that is permanently 
mounted in an air-conditioned building)- However, many measures in 
the perceptu. 1, cognitive, and psychomotor areas were relevant. 

2. What computers were selected for use? and, 

3. What computer languages are being used? 

We observed three d' .erent microprocessors in use- -the Apple, 
Terak, and PDP ll--and three different computer U.tguages- -PASCAL, 
BASIC, and FORTRAN. There appeared to be relative"^*^ IHtle in 
common among the ^^'.r sites in terms of the hardv/art,/software used. 

4. How reliable are these comp^jterizeu measures? and, 

5. What criterion-related validity evidence exists for these measures 
so far? 

Data were currently being collected at all four sites to address 
the reliability and criterion-related validity questions, but very 
little documented information was available. The research at AFHRL 
was at the point of administering computerized measures to fairly 
large samples of subjects. This was also true of the research at 
Fort Rucker, where they expected to have validity data collected 
and analyzed by sometime in 1984. 

A number of the measures had been under study at NAMRL for some 
time, but criterion-related validity had not been the primary focus 
3f their work. The prototype information processing measures de- 
veloped there had been shown to be sensitive to individual differ- 
ences within chronological age groups as well as to age-related 
changes across different age groups. We were not able to observe 
these measures directly as they were being administered off-site, 
under NAMRL contract at the Aviation Research Labo*^atory in Illi- 
nois, but the research was described to us in som^^ detail. 
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5?** computerized measures at Fort Knox were being analyzed. 

Their efforts apparently were hampered by severe range restriction 
in the predictors well as some problems with the criterion 
measures. Thpy were finding significant, positive cor*-elations 
between njicroprocessor measures and their higher fidelity, "hands- 
on" counterparts. 

To summarize, little information was then available on the reliability 
or criterion-related validity of the computerized measures in use at the 
sites. This was not surprising since most of the measures had been devel- 
oped only recently. 

Nevertheless, we learned some valuable lessons. First, large-scale 
testing can be carried out on microprocessor equipment (AFHRL was doing 
so). Second, a variety of software and hardware can produce satisfactory 
results, but we should carefully evaluate options before making these 
choices. Third, it would be highly desirable to have the testing devices 
or apparatus be as compact and simple in design as possible, in order to 
minimize down time and make transportation feasible. Fourth, we b«qan to 
form the impression that it would be highly ::esirable to develop our soft- 
ware and hardware devices to be as completely self-administering (i.e., 
little or no input required from test monitors) as possible and as imper- 
vious as possible to prior experience with typewriting and playing video 
games. 

Phase 2. Demonstration ^^i^ttrftrY 

After conducting the site visits, we programmed a short demonstration 
battery in the BASIC language on the Osborne 1, a portable microprocessor. 
The purpose was to implement some of the techniques and procedures observed 
during the visits in order to determine the degree of difficulty of such 
programming, and to get an idea of the quality of results to be expected 
from. using a common portable microprocessor and a language that is common 
flexibility ^^""^ disadvantages in processing power, speed, and 

This short battery was self-administering, recorded time-to-answer and 
the answer made, and contained five tests: simple reaction time (pressing 
a key when a stimulus appeared), choice reaction time (pressing one of two 
keys in response to one of two stimuli), perceptual speed and accuracy 
JuJSiJL S° ^lP5*""'neric phrases for similarity), verbal comprehension 
(vocabulary knowledge), and a self-rating form (indicating which of two 
aojectives- best - describes the examinee, on a 7-point scale). We also 
experimented with the programming of several types of visual tracking 
tests, but did not include these in the self- administered demonstration 
battery. 

r.ii !!°.?**? 1?®''? collected with this demonstration battery, but it ful- 
filled its intended purposes. Experience in developing and using the 
battery conv need us that the BASIC language did not allow enough power and 
control of t mi ng events to be useful for our purposes. Tiie bc-.ic methods 
for controlling stimulus presentation and response acquisition through a 
keyboard were thoroughly explored. Techniques for developing a self- 
administering battery of tests were tried out. 
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The second activity during this phase was consultation at the Univer- 
sity of Illinois with three experts about perceptual/psychomotor abilities 
and their measurements* We met with them to review what we had learned 
from our activities to date, discuss our near-term development plans, and 
get their reactions. We also discussed their program of research in this 
area and observed their computerized testing facility. The major points 
that emerged from this meeting were: 

0 Generally speaking, it may be difficult to obtain discriminant 
validity with the addition of new predictors (beyond the ASVAB), 
but the approach being taken by Project A Task 2 seems to allow the 
maximal opportunity for this to occur and it allows the testing of 
the hypothesis. 

0 The results obtained in World War II using electro-mechanical, 
psychomotor testing apparatus probably do generalize to the present 
era in terms of the structure of abilities and the usefulness of 
such abilities for predicting job performance in jobs like aircraft 
pilot. 

0 The taxonomy of psychomotor skills and abilities probably should be 
viewed in a hierarchical fashion, and perhaps Project A*s develop- 
ment efforts would be best focused on two or three relatively high- 
level abilities such as gross motor coordination, multilimb con- 
stant processing tasks, and fine manipulative dexterity. 

0 Rate of learning or practice effects are viewed as a major concern 
for evaluating the usefulness of psychomotor ability measures for 
predicting on-the-job performance. If later test performance (af- 
ter many trials) was much more valid than early test performance 
(early trials), or worse, if early test performance was not valid 
and later test performance was, then it is unlikely that psychomo- 
tor testing would be practically feasible in the operational mili- 
tary-selection environment. There are, however, no empirically 
based answers to these questions, and it is acknowledged that 
research is necessary to obtain answers, especially with micropro- 
cessor-driven testing methods. 

Phase 3, Selection/Purchase of Mi croprocessors and Devel ooment/Trvout of 

On the basis of t'-ie information from the first two phases, we defined 
the desirable charact«»ri sties of a microprocessor useful for our research. 
A prime consideration was transportability. Almost all of our pilot 
testing and other data collection efforts would take place at various field 
sites throughout the United States and Europe. We would not be able to 
build a stationary laboratory and bring the soldiers to the site. 

Following are the desired characteristics as we outlined them in the 
Fall of 1983; 



^ Charles Hulin, John Adams, and Phillip Ackerman were the consultants. 
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1. Reliability- -This encompasses several considerations. First, the 
machine should be manufactured and maintained by a company that has 
a history of backing its products and, even more basic, is likely 
to remain in business. Second, th'? machine itself should be fairly 
rugged and capable of being carrivJ around without breaking down. 

2. Portability--Since we will need to transport the computer to sev- 
eral posts during development efforts, the machine should be as 
portable as possible, and, if feasible, extremely easy to assemble 
and disassemble. 

3. Most Recent Generation of Machine--Progress is very rapid in this 
area; therefore, we should get the latest "proven" type of machine. 
That means getting a 16-bit microprocessor rather than an 8-bit 
microprocessor. This way, the software developed will be more 
likely to be usable on future machines. 

4. Compatibility— Although extremely difficult to achieve, a desirable 
goal is to have a machine that is maximally compatible with other 
machines, or that will have software that will be compatible with 
other machines. Thus, we think a CPU- based machine or some version 
of the 8088 chip is best. 

5. Appropriate Display Size, Memory Size, Disk Drives, Graphics, and 
PpHpheral Capabilities— We need a video display that is at least 
nine inches (diagonally), but it need not be a color monitor. 
Since we will be developing experimental software, we need a rela- 
tively large amount of random access memory, and 256 K seems to be 
the largest memory size that is generally available. (Later 
project efforts to create maximally efficient use of memory may 
considerably reduce this requirement.) Also we require two floppy 
disk drives to store needed software and to record subjects' re- 
sponses. High-resolution graphics capability is desirable for some 
of the kinds of tests we will develop. Finally, since several of 
the ability measurement processes will require the use of paddles, 
joysticks, or other similar devices, the machine must have the 
appropriate hardware and software to allow this. 

The characteristics listed in the above statement were used as cri- 
teria for evaluating commercially available microprocessors. Most machines 
were eliminated because they were very new on the market and thus had no 
history, or they were made by relatively unknown manufacturers. 

In the end we selected Compaq portable microprocessors with 256 K 
random access memory, two 320 K-byte disk drives, a "game board" for ac- 
cepting input from peripheral devices such as joysticks, and software for 
FORTRAN, PASCAL, BASIC, and assembly language programming. Six of these 
machines were purchased in December 1983. We also purchased six commer- 
cially available, dual-axis joysticks. 

We then developed the initial version of the software needed to test 
several perceptual/psychomotor abilities that we were reasonably certain 
would be chosen for final inclusion in the Pilot Trial Battery, although 
those abilities had not yet been finally selected. We had three general, 
operational objectives in mind for the software to be produced: (1) as far 
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as possible, it should be transportable to other microprocessors; (2) it 
should require as little intervention as possible from a test administrator 
in the process of presenting the tests to subjects and storing the data; 
and (3) it should enhance the standardization of testing by adjusting for 
hardware differences across computers and response pedestals. 

We first had to choose a primary language. We chose to prepare the 
bulk of the software using the PASCAL language as implemented by Microsoft, 
Inc. PASCAL is a common language and it is implemented using a compiler 
that permits modularized development and software libraries. As computer 
languages go, PASCAL is relatively easy for others to read and it can be 
ifflplemented on a variety of computers. 

Some processes, mostly those that are specific to the hardware config- 
uration, had to be written in IBM-PC assembly language. Examples include 
Interpretation of the peripheral device inputs, reading of the real -time- 
clock registers, calibrated timing loops, and specialized graphics and 
screen manipulation routines. For each of these identified functions, a 
PASCAL-callable "primitive" routine with a unitary purpose was written in 
assembly language. Although the machine-specific code would be useless on 
a different type of machine, the functions were sufficiently simple and 
unitary in purpose so that they could be reproduced with relative ease. 

The overall strategy of the software developr nt was to take advantage 
of each researcher's input as directly as possible. It quickly became 
clear that the direct programming of every item in every test by one person 
(a programmer) was not going to be very successful in terms of either time 
constraints or quality of product. To make it possible for each researcher 
to contribute his/her judgment and effort to the project, it was necessary 
to plan so as to, as much as possible, take the "programmer" out of the 
step between conception and product and enable researchers to create and 
enter items without having to know special programming. 

The testing software modules were designed as "command processors" 
which interpreted relatively simple and problem- oriented commands. These 
were organized in ordinary text written by the various researchers using 
word processors. Many of the commands were common across all tests. For 
instance, there were commands that permitted writing of specified text to 
"windows" on the screen and controlling the screen attributes (brightness, 
background shade, etc); a command could hold a display on the screen for a 
period measured to 1/lOOth-second accuracy. There were commands that 
caused the program to wait for the respondent to push a particular button. 
Other commands caused the cursor to disappear or the screen to go blank 
during the construction of a complex display. 

Some of the commands were specific to particular item types. These 
commands were selected and programmed according to the needs of a particu- 
lar test type. For each item type, we decided upon the relevant stimulus 
properties to vary and built a command that would allow the item writer to 
quickly construct a set of commands for items which he or she could then 
inspect on the screen. 

These techniques made it possible for entire tests to be constructed 
and experimentally manipulated by psyc'>ologists who could not program a 
computer. 
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As this software was written, we used it to administer the comput- 
erized tests to small groups of soldiers (N - 5 or fewer) at the Minnea- 
polis Military Entrance Processing Station (MEPS). These soldiers were 
told about Project A, that their participation was voluntary and the test 
results would not affect their status, but that we nsadad to have them try 
their very best so that we could evaluate the tests. They were also asked 
to write down anything about the tests that bothered them or any problems 
they encountered during the testing, and told that the researchers would 
talk to them about the computerized test battery when they were finished. 
The soldiers completed the battery without assistance from the researchers, 
unless it was absolutely necessary, and were then questioned. 

The nature of these questions varied over the progress of these devel- 
opmental tryouts, but mainly dealt with clarity of instructions, diffi- 
culty of tests or test items, screen brightness problems, difficulties 
using keyboard or joysticks, clarity of visual displays, and their general 
(favorable/unfavorable) reaction to this type of testing. 

These tryouts were held from 20 January 1984 through 1 March 1984, 
and a total of 42 persons participated in nine sessions. The feedback from 
the participants was extremely useful in determining the shape of the 
tests, prior to tne first pilot test of the Pilot Trial Battery. After 
each tryout, we would modify the software to clarify instructions, make 
item or test difficulties more appropriate, make stimulus displays and 
sequences of events more appropriate, and so forth. We also performed 
simple analyses of the data collected, but mainly to insure that responses 
were being captured and recorded correctly- -not for any substantive an- 
alyses of the tests or constructs. 

At the end of Phase 3, we had developed » self-administering, comput- 
erized test battery that was implemented on a Compaq portable computer. 
The subjects responded on the normal keyboard for all tests except a 
tracking test that required them to use a joystick. This joystick was a 
commercially available device normally used for video games. Seven dif- 
ferent tests had been programmed. These were not necessarily tests we 
wished to include in the Pilot Trial Battery, but five did eventually end 
up in that battery. 

Phase 4. Continued Software Dev elopment and Desion/Construction of a 
Response Pedestal 

During the fourth phase, several significant events took place during 
March-May 1984. An in-progress-review (IPR) meeting was held at which we 
presented the results of the development efforts to date and received 
guidance on next efforts from ARI staff, the Scientific Advisory Group 
subcommittee assigned to Task 2, and other Project A researchers. We made 
field observations of some combat MOS in order to inform the further devel- 
opment of computerized tests; the first pilot test of the computerized 
battery wa:. completed; and we designed and constructed a custom-made re- 
sponse pedestal for the computerized battery. 

The primary result of the in-progress-review was the identification 
and prioritization of the ability constructs for which computerized tests 
should be developed. Chapter 5 describes these constructs in some detail. 
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A second result of the review was a decision to go to the field to observe 
several combat arms MOS In order to target the tests more closely to those 
skills, Insofar as that was possible. 

These field observations subsequently took place at several posts. 
They were relatively Informal; we simply observed soldiers (usually a very 
small number) working at their Job$ In the field and, where possible, asked 
questions to clarify their activities. We did complete a brief checklist 
that required a rating of the degree of Importance for the Job of several 
cognitive, perceptual, and psychomotor abilities; these checklists were not 
formally analyzed but were used for later discussions and development 
efforts. We also operated various training aids and simulators available 
during our visits. The HOS for which we were able to complete these field 
observations were: IIB (Infantryman), 136 (Cannon Crewman), 19K (Armor 
Crewman), 16S (MANPAOS Crewman), and 05C (Radio Teletype Operator). 

On one of these site visits we were able to administer the comput- 
erized battery to several trainers (for Armor Crewman, 19K). The primary 
outcome of their feedback was a decision to develop a test that utilized 
military aircraft and vehicle profiles in an identification task. Their 
suggestion corroborated our field observations that such a test seemed more 
appropriate than a test then in the battery that was Intended to predict 
skill at target identification (this test had been adapted from the Hidden 
Figures test in the ETS battery). 

The first pilot test of the Pilot Trial Battery occurred at Fort 
Carson during this phase. (See Chapter 2 for a description of the sample 
and procedures of that pilot test.) For the computerized tests, the same 
procedures were used as tor the MEPS tryouts described above in Phase 3. A 
total of 20 soldiers completed the computerized battery. 

The information from this pilot test primarily conf ironed a major 
concern that had surfaced during the MEPS tryouts, namely the undesirabil- 
ity of the computer keyboard and commercially available joysticks for 
acquiring test responses. Feedback from subjects (and our observations) 
indicated that (1) it was difficult to pick out one or two keys on the 
keyboard, and (2) fairly elaborate, and therefore confusing. Instructions 
were needed to use the keyboard in this manner. Even with such instruc- 
tions, subjects often missed the appropriate key, or inadvertently pressed 
the keys because they were leaving their fingers on the key in order to 
retain the appropriate position for response. Also, subjects varied in the 
way they prepared for test items, and the more or less random positioning 
of their hands added unwanted (error) variance to their scores. 

Similar Issues arose with the joysticks, but the main problems were 
their lack of durability and the large variance across joysticks in their 
operating characteristics, again adding error variance. 

After consultation with ARI and other Project A researchers, we de- 
cided to develop a custom-made response pedestal in an attempt to alleviate 
these problems. We drew up a rough design for such a pedestal and con- 
tracted with an engineering firm to fabricate a prototype. We tried out 
the first prototype, suggested modifications, and had six copies produced 
in time for the Fort Lewis pilot test in June 1984. Chapter 5 describes 
the response pede:$tal in some detail. 
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the abilities that had been chosen for Inclusion in the Pilot Trial Batt 
*nd (2) accommodate the new response pedestal. 
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PILOT TRIAL BAHERY 

Identificati on of Measures 

In March 1984, an IPR meeting was held to decide on the measures to be 
developed for the Pilot Trial Battery. Information from the literature 
review, expert judgments, initial analyses of the preliminary battery, and 
the first three phases of computer battery development was presented and 
discussed. Task 2 staff made recommendations for inclusions of measures 
and these were evaluated and revised. Figure 1.5 shows the results of that 
deliberation process. (The names of the tests developed for the Pilot 
Trial Battery are shown in the right-hand column of Figure 1.5. Each of 
these tests is dealt with extensively in later chapters, so we make no 
attempt to describe them here.) This set of recommendations served is the 
blueprint for Task 2's test development efforts for the next several 
months. 

Pilot Tests and Field Tests 

There were three pilot tests of the measures developed for the Pilot 
Trial Battery. These took place at Fort Carson in April 1984, Fort Camp- 
bell in May 1984, and Fort Lewis in June 1984. At the first two sites not 
all Pilot Trial battery measures were administered, but the complete bat- 
tery was administered at Fort Lewis. Subsequent chapters of this report 
describe these pilot tests, resulting analyses, and revisions to measures 
prior to the field tests. The reports of analyses of the pilot test data 
emphasize the Fort Lewis administration because it was the first time the 
complete battery was administered and it w?.s the largest pilot test sample. 
(The pilot tests, especially those at Fort Carson and Fort Campbell, are 
often referred to as "tryouts" in the remainder of this report.) 

A field test of the complete Pilot Trial Battery was conducted at Fort 
Knox in September 1984. In addition, supplementary field test studies were 
conducted at Fort Knox, Fort Bragg, and the Minneapolis MEPS during the 
Fall of 1984. Following analysis of the field test results, the test 
battery was revised for use in the Concurrent Validation administration. 

The data collection procedures and samples for the various tests are 
described in Chapter 2 of this report. Description of the measures them- 
selves, and of the results of the tests and analyses, is organized by the 
major types of predictor categories: 

Cognitive Paper-and-Pencil -- Chapter 3, Pilot Tests, and 

Chapter 4, Field Test 

Perceptual /Psychomotor , 
Computer-Administered -- Chapter 5, Pilot Tests, and 

Chaptei 6, Field Test 

Non-Cognitive Paper-and-Pencil -- Chapter 7, Pilot Tests, and 

Chapter 8, Field Test 

Revisions of the measures after field testing, into the form to be 
used in Concurrent Validation, are described in Chapter 9. 
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2 Dependability 
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Life Experiences) 



1 
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3 


Conventional 


4 


Social 


S 
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6 


Enterprising 



AVOICE (Army Vocational 

Interest Career Examination) 
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I 



Multilimb Combination target Tracking Test 2 - Computer 

Target Shoot - Computer 

I ni:^li'v2.i.;ur' : : : : ; ; ; ; l^zil] ' " ^""p"'" 

*Flnal priority arrived at via consensu. o£ March 1984 IPR attendants. 



Figure 1.5. 
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CHAPTER 2 



TEST DATA COLLECTION: PROCEDURES AND SAMPLES 
uanis 3. Houston 



In this chapter, we describe the procedures used to collect data at 
the pilot and fieid test sites and report basic descriptive data about the 
sample of soldiers that participated. 

PILOT TESTS 

Pilot Test #1; Fort Carson 
Procedures 

On 17 April 1984, a sample of 43 soldiers at Fort Carson, Colorado 
participated in the first pilot testing of the Pilot Trial Battery, The 
testing session ran from 0800 hours to 1700 nours, with two 15-minute 
breaks (one mid-morning and one mid-afternoon) and a one-hour break for 
1 unch . 

Groups of five soldiers at a time were randomly selected to take 
computerized measures in a separate room while the remaining soldiers took 
paper-and-pencil tests. When a group of five soldiers completed the com- 
puterized measures, they were individually and collectively interviewed 
about their reactions to the computerized tests, especially regarding 
clarity of instructions, face validity of tests, sensitivity of items, and 
their general disposition toward such tests. The soldiers then returned to 
the pap)r-and-pencil testing session, and another group of five was se- 
lected to take the computer measures. 

Thus, not all the soldiers took all of the tests. The maximum N for 
any single paper-and-pencil test was 38 ,(43 minus the 5 taking computer 
tests). Computerized measures were administered to a total of 20 
soldiers. The new paper-and-pencil cognitive tests in th" "ilot Trial 
Battery were each administered in two equally timed halves, to investigate 
the Part 1/Part 2 correlations as estimates of test reliability. 

After actual test administration was completed, ten soldiers were 
selected to give specific, test-by-test feedback about paper-and-pencil 
tests in a small group session, while the remaining soldiers participated 
in a more general feedback and debriefing session. 



Tests Administerecj 

Table 2.1 contains a list of all the tests administered at Fort Car- 
son, in the order in which they were administered, with the time limit and 
number of items for each test. These tests can be categorized as follows: 

0 10 new cognitive paper-and-pencil measures 

0 9 marker tests for new paper-and-pencil cognitive measures 

0 7 computerized measures 
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Table 2.1 

Pilot Tests Administered at Fo rt Carson. 17 Aoril 1984 



Time 







Limit 


No. of 








(MinsJ 




Tvoe of Test 


Paper-and-Pencil Tests 








1. 


Path Test 


g 




new 1 vii%jii 1 L 1 vc 


2. 


Reasoning Test 1 


14 


30 


New, Cognitive 


3. 


EA5 Test 1 • verDai tooiprenension 


5 


30 


Harker, Cognitive 


4. 


Orientation Test 1 


p 
o 


CKJ 


Naw Pnnn^^^ifA 
ncW| CUgillLIVc 


5. 


Shapes Test 






Naw Cf\nn^^ 4 uo 

ncii 1 \^ui|n 1 1 1 vc 


6. 


EAS Test 2 - Numerical Ability 


10 


75 


Marker, Cognitive 


7. 


Object Rotation Test 


7 


SO 


New, Cognitive 


o. 


tio tnoosing a ratn 


o 
o 




nailvcF y 1 1 i vc 


9. 


Orientation Test 2 


Q 
O 


CM 


Naw CtxnnA^ 4 ua 

ncWy cui|n 1 1 1 Yc 


10. 


Reasoning Test 2 




oc 


Now rnnnit'tuo 
ncWy vUij III Live 


II. 


Orientation Test 3 


12 


20 


New, Cognitive 


12. 


Assembling Objects Test 


16 


30 


New, Cognitive 


13. 


Haze Test 


9 


24 


New, Cognitive 


14. 


Mental Rotations Test 


10 


20 


Harker, Cognitive 


15. 


ETS Hidden Figures 


14 


16 


Marker Coonitlve 


16. 


ETS Hap Planning 






Mav*kor ronnltivo 

klCll NCI y WU\|ll 1 b 1 VC 


17. 


ETS Figure Classification 


8 


14 


Markor CocinitlvA 


18. 


EA3 Test 5 - Space Visualization 




SO 


Markor Cnciniti VP 

IKXI NCI y 1 U 1 vC 


19. 


FIT Assembly 


10 


20 


Marker Coonitlve 


Computer Measures* 








1. 


Simple Reaction Time 


Nnnp 


15 


Now Pprrontiial / 

llCft y r Ci V#C|/ UUQI 1 / 








r ^jf v#iiuiiiu kUi 


2. 


Choice Reaction Time 


Nnnp 


15 


Now Porrontiial / 










Psychomotor 


3. 


Perceptual Speed & Accuracy 


None 


80 


NeWy Perceptual/ 










Psych<>motor 


4. 


Tracing Test 


None 


26 


NeWy Perceptual/ 










Psychomotor 


5. 


Short Hemory Test 


None 


50 


NeWy Perceptual/ 










Psychomotor 


6. 


Hidden Figures Test 


None 


32 


New, Perceptual/ 










Psychomotor 


7. 


Target Shoot 


None 


20 


NeWy Perceptual/ 



Psychomotor 



^ All computer measures were administered using a Compaq portable micro- 
processor with a standard keyboard plus a commercially available dual -axis 
joystick. 
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The new paper-and-pencil cognitive tests were tests newly develooed bv 

constructs or abilities that had been se- 
5^^"*°r*?'?* ^" t^^^^^^ stages of th- research (see Chapter 1) 
Detailed descriptions of the development ana analyses of these tests are 

S ^ f"^ '"^'''^^'^ were published testf that Sere 

viewed as the closest or best measure of the selected abilities. 

Sample Descrinti^n 

X . if P'^fviously mentioned, a total cf 43 soldiers participated in Pilot 
Test with 20 soldiers completing the computerized measures and a maxi- 
n ^o^-Pleting individual papSr-and-pencil tests TabTfL 
presents a brief demographic description of the sample. 

Table 2.2 

Description of Fort Ca rson Sample (N . A}) 



1 B-A: 

Mean - 22.76 years 
Median - 21.50 years 
Standard Deviation - 2.19 

2. Current MOSt 

m ti 



3. Zvl: 

Males 33 
Females 10 



19 
11 
13 
16 
98 

05 
27 
64 
76 
91 

96 
24 

31 
36 
71 

75 



8 
6 
5 
4 
3 

2 
2 
2 

2 
2 

2 

1 
1 
1 
1 



4. Rac^ ; 



Black 
Asian 
White 
Hispanic 5 
Other 3 



10 
1 
24 



5. Ye ars in the Service ; 

^.jmputed from Date of Enlistment) 

Mean - 1.72 

Median > 1.55 

Standard .aviation - 1.10 
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PiV t Test #2: Fort Campbell 



Procedures 

The second pilot testing session was conducted at Ft. Campbell, Ken- 
tucky on 16 May 1984* A sample of 57 soldiers .; tended the 8-hour session, 
and all 57 completed paper-and-pencil tests* No computerized measures were 
administered at this pilot session. Once again, the ten new cognitive 
tests were administered in two equally timed halves, to investigate 
Part 1/Part 2 correlations. 

Because we were still experimenting with time limits on the new cogni- 
tive tests, soldiers were asked to mark which item they were on when time 
was called for each of these tests, and then to continue to work on that 
part of the test until they finished. Finishing times were recorded for 
all the tests (Parts 1 and 2 separately, where appropriate). 

After test administration was completed, the group was divided. Ten 
Individuals were selected to provide specific feedback concernir^g the new 
non-cognitive measures and the remaining individuals provided feedback on 
the new cognitive measures • 

Tests Administered 

Table 2.3 lists all the tests and inventories administered at Pilot 
Test #2: Fort Campbell, along with the time limit and number of items for 
each. There were ten new cognitive tests with five cognitive marker tests, 
and two new non-cognitive inventories with one non-cognitive marker inven- 
tory. No computerized measures were administered. 

The two new non-cognitive inventories were developed by the resear- 
chers to measure the constructs selected as important in earlier stages of 
the i^esearch (see Chapter 1). The Assessment of Background and Life Exper- 
ie' es (ABLE) measured temperament and biodata constructs and the Army 
Vo .tional Interest Career Examination (AVOICE) measured vocational in- 
terests. The Personal Opinion Inventory (POI) was intended as a marker 
inventory in that it contained published scales thought to measure the 
constructs selected as important in the temperament domain. Detailed 
descriptions of the rationale, development, and analyses of the new non- 
cognitive inventories are provided in Chapters 7 and 8. 

Sample Descriot'lon 

A total of 57 soldiers completed the Pilot Trial Battery as adminis- 
tered at Fort Campbell. A description of this sample's demographic make-up 
appears in Table 2.4. 

Pilot Test #3: Fort Lewis 

Procedures 

For the third pilot testing session, approximately 24 soldiers per day 
for five days (11-15 June 1984) were available for testing at Fort Lewis, 
Wafhington. Test sessions ran from 0800 hours to 1700 hours, with short 
breaks in the morning and afternoon, and a one-hour lunch break. The 
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Table 2.3 

Pilot Tests AdministerpH A^f. fnrt r amoben. 16 Hav 1984 



PaDer-and-Pen(;;n Tp«:tc 

1. Path sest 

2. Reasoning Test 1 

3. Z.\S Verbal Comprehension 

4. Orirntalion Test 1 

5. Shapes Test 

6. Object Rotation Test 

7. Reasoning Test 2 

8. Orientation Test 2 

9. ABLE (Assessment of Background 

and Life Experiences) 

10. Orientation Test 3 

11. Assembling Objects Test 

12. Maze Test 

13. AVOICE (Army Vocational Interest 

Career Examination) 

14. ETS Hidden Figures 

15. ETS Hap Planning 

16. ETS Figure Classification 

17. FIT Assembly 

18. POI (Personal Opinion Inventory) 



Total 
Time 

Limit No. of 



(Min?0. 


Items 


Tvoe of Test 


9 
14 
5 
9 

16 


44 

30 
30 
30 
54 


New, Cognitive 
New, Cognitive 
Marker, Cognitive 

Naw Pnnn 4 4 1/ a 

New, Cognitive 


9 

11 
8 


90 
32 
20 


New, Cognitive 
New, Cognitive 
New, Cognitive 


None 


291 


New, Non-Cognitive 


12 
16 
8 


20 
40 
24 


New, Cognitive 
New, Cognitive 
NeWr Cognitive 


None 


306 


New, Non-Cognitive 


14 
6 
8 

10 
None 


16 
40 
14 

20 
121 


Marker, Cognitive 
Marker, Cognitive 
Marker, Cognitive 
Marker, Cognitive 
Marker, Non-Cognitive 
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Table 2.4 

DescHotion of Fort Campbell Sample (N - 571 



1. 



3. Sgx: 



Mean - 21.40 years 
Median - 21 years 
SD - 3.07 



Males 46 
Females 11 



4. Race : 



2. 



Current MPS ; 



Black 15 
Asian 1 
White 36 
Hispanic 5 



76 
63 
27 
52 
31 



19 
11 
9 
9 
3 



5. Years in the Service : 

(Computed from Date of Enlistment) 



Mean - 1.84 



36 
71 
54 
62 



2 
2 
1 
1 



Median - 1.67 



SD - 1.27 



entire Pilot Trial Battery, including new cognitive and non-cognitive 
measures, was administered to all soldiers. To accomplish this, the sche- 
dule displayed in Table 2.5 was followed. 

Each day, the approximately 24 soldiers were divided into four groups 
(labeled A, B, C, and D) of six soldiers each. While Group A took the con- 
puterized measures, groups B, C, and D.took the first half of the paper- 
and-pencil cognitive tests (labeled CI). While Group B took the comput- 
erized measures. Groups A, C, and D took the second half of the paper-and- 
pencil cognitive measures (labeled C2), and while Group C took the comput- 
erized measures. Groups A, B, and D took the paper-and-pencil non-cognitive 
measures (labeled NC). At approximately 1500 hours, each group took that 
portion of the Pilot Trial Battery they had not yet received. 

Once again, the new paper-and-pencil cognitive tests were administered 
in two equally timed halves to investigate Part 1/Part 2 correlations as 
estimates of test reliability. Individuals were not allowed any extra time 
to work on each test beyond the time limits, but finishing times were 
recorded for individuals completing tests before time was called. 

After a soldier completed the computerized battery, each was asked 
about his or her general reaction to the computerized bat'iery, the clarity 
and completeness of the instructions, perceived difficulty of the tests, 
and ease of use of the response apparatus. 
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Table 2.5 

Dally Schedule for Fort Lewis Pilot Testing ^ 



Approxlliljltg Tinig Room 1 Room 2 Room 3 

0800 to 0815 A, B, C, D for 







Introduction, etc. 






0815 to 


1000 


B, C, 0 take first half 
of Cognitive Tests (CI) 


A takes all 
measures 


computer 


1015 to 


1200 


A, C, D take second half 
of Cognitive Tests {C2) 


B takes all 
measures 


computer 


1300 to 


1500 


A, B, D take all Non- 
Cognitive Measures (NC) 


C takes all 
measures 


computer 


1515 to 


1700 


A takes CI 


D takes all 
maasures 


computer B takes 



C takes NC 



^ Each day the soldiers in the sample were divided into four groups of 
approximately six soldiers each, referred to here as Groups A, B, C, and D. 

Tests Administered 

The tests administered at Pilot Test #3 in Fort Lewis, are listed in 
Table 2.6, with the time limit and number of item:> in each test. A summary 
of these tests follows: 

0 10 new, paper-and-pencil, cognitive tests 

0 4 marker, paper-and-pencil, cognitive tests 

0 2 new, paper-and-pencil, non-cognitive tests 

0 8 new, computerized, perceptual/psychomotor measures 

Sample Descrintinr^ 

Table 2.7 provides demographic information about the Fort Lewis sam- 
ple. A total of 118 soldiers participated in the pilot testing. 

.Summary of Pilot Tp<ft^ 

The Pilot Test Battery was initially developed in March 1984 and went 
through three complete pilot testing iterations by August 1934. after each 
iteration, observations noted during administration were scrutinized, data 
analyzed, and results carefully examined. Revisions were made in specific 
item content, test length, and time limits, where appropriate. 

Table 2.8 summarizes the three Pilot Test sessions conducted during 
this period, with the total sample size for each, and the number and types 
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Table 2.6 

Pilot Tests Administered at Fort Lewis. 11-15 June 1984 



Total 



Administration 



CI 



C2 



NC 



Computerized Measures^: 
Simple Reaction Time 
Choice Reaction Time 
Perceptual Speed i Accuracy 
Target Tracking Test 1 
Target Tracking Test 2 
Target Identification Test 
Memory Test 
Target (Shoot) Test 





Time 


No. of 




Te?1^ 


Liimt 


items 


1 VDe Or lest 


Paper-and Pencil Tests 








Path Test 


8 


44 


New, Cognitive 


Reasoning Test 1 


12 


30 


New, Cognitive 


Orientation Test 1 


10 


30 


New, Cognitive 


Shapes Test 


16 


54 


New, Cognitive 


Object Rotation Test 


8 


90 


New, Logm n ve 


Reasoning Test 2 


10 


32 


New, Cognitive 


Maze Test 


6 


24 


New, Cognitive 


SRA Word Grouping 


5 


30 


Marker, Cognitive 


Orientation Test 2 


10 


24 


New, Cognitive 


Orientation Test 3 


12 


20 


New, Cognitive 


Assembling Objects Test 


16 


40 


New, Cognitive 


ETS Map Planning 


6 


40 


Marker, Cognitive 


Mental Rotations Test 


10 


20 


Marker, Cognitive 


DAT Abstract Reasoning 


13 


25 


Marker, Cognitive 


ABLE 


None 


268 


New, Non-Cognitive 


AVOICE 


None 


306 


New, Non-Cognitive 



None 


15 


New, Perceptual/ 






Psychomotor 


None 


15 


New, Perceptual/ 






Psychomotor 


None 


80 


New, Perceptual/ 






Psychomotor 


None 


18 


New, Perceptual/ 






Psychomotor 


None 


18 


New, Perceptual/ 






Psychomotor 


None 


44 


New, Perceptual/ 






Psychomotor 


None 


50 


New, Perceptual/ 






Psychomotor 


None 


40 


New, Perceptual/ 






Psychomotor 



^ All computer measures were administered via a custom-made response pedestal 
designed specifically for this purpose. No responses were made on the computer 
keyboard. A Compaq microprocessor was used. 
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Table 2.7 

Description of Fort Lew is Sample (H ^ 118) 



1. Acs : 

Mean - 22.82 years 
Median - 22.21 years 
SD - 4.2 

2. Current HQS : 



/A a. 



\uonnnuea; 



MPS 


H 


05d 


2 


u5C 


5 




13 


lie 


6 


IIH 


12 


1 or 
lot 


1 


IOC 

lot 


z 


IOC 

lor 


o 

6 


19E 


1 


27E 


1 




1 


Ol V 


0 








i. 


54C 


5 


54E 


2 


63B 


4 


63J 


1 


63W 


1 


64C 


5 


67V 


4 


67Y 


2 


68G 


1 


68J 


1 


71L 


4 


72E 


2 


73C 


1 


74D 


1 


74F 


1 


75B 


3 



75F 
76C 
76P 
76V 
76W 

76Y 
82C 
83F 
91B 
948 



1 

2 
1 
5 
2 

6 
2 
1 
5 
2 



3. SSX: 

Males 97 
Females 22 

4. Race : 

Black 
Hispanic 
White 
Asian 

North American 

Indian 
Other 
Blank 



30 
14 
66 
3 

2 
1 
2 



5. Years in the Service : 

{Computed from Date of Enlistment) 

Mean =-2.55 

Median » 1.75 

SD « 2.90 
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Table 2.8 

Summary of Pilot Testing Sessions for Pilot Trial Battery 



Pilot 
Test » 



Location 
Fort Carson 



17 April 
1984 



Total 
Sample 
Size 

43 



Fort Campbell 16 May 57 
1984 



Fort Lewis 11-15 June 118 
1984 



No. /Type of Tests Administered 

10 New Cognitive 
9 Marker Cognitive 
0 New Non -Cognitive 

0 Marker Non-Cognitive 

7 Computerized Measures 

10 New Cognitive 
5 Marker Cognitive 
2 New Non -Cognitive 

1 Marker Non-Cognitive 
0 Computerized Measures 

10 New Cognitive 
4 Marker Cognitive 

2 New Non-Ccgnitive 

0 Marker Non-Cognitive 

8 Computerized Measures 



of tests administereJ at each. Appendix F is a copy of the Pilot Trial 
Battery as it was administered in June 1984, at Fort Lewis and Appendix G 
is a copy of the revised Pilot trail Battery as it was administered In the 
field tests dur1:tg Fall 1984. (Both Appendix F and Appendix G are con- 
tained in a separate limited-distribution report, AP.I Research Note 87-24, 
as noted on pa^e xlv.) 
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FIELD TESTS 



The full Pilot Trial Battery was administered at Fort Knox in Septem- 
ber 1984 in a formal field test to evaluate all of the component measures 
and to analyze psychoastric characteristics of the data obtained. In 
addition, test-retest effects and practice effects were analyzed as part of 
the Fort Knox field testing, and fakability studies were conducted at Fort 
Bragg and the Minneapolis Military Entrance Processing Station (MEPS). 

Field Test of Pilot Trial Ratterv; Fort Knox 

The field test of the Pilot Trial Battery at Fort Knox was conducted 
to evaluate the psychometric characteristics of all of the measures in the 
battery, and to analyze the covariance of the measures with each other and 
with the ASVAB. 

Procedures 

Data collection was scheduled for four weeks at Fort Knox. During the 
first two weeks, 24 soldiers were scheduled each day. On some days, how- 
ever, more than 24 soldiers arrived for testing. Because of the limited 
availability of computer testing stations (only six), 24 soldiers was the 
■aximun number that could complete the entire battery. The "overflow" 
soldiers, however, did complete all of the paper-and-pencil measures. 

Each group of soldiers assembled at 0800. The testing sessions in- 
cluded two 15-minute breaks, and one hour was allowed for lunch. When the 
soldiers were assembled, they were divided into four groups if there were 
24 or fewer soldiers, and into five groups if there were more than 24 sol- 
diers. 

Figure 2.1 shows the daily schedule of testing for the first two weeks 
when the full Pilot Trial Battery was being field tested. Figure 2.2 shows 
the daily schedule in a different way, denoting the room assignments for 
each group of soldiers throughout the day. 

Figure 2.3 shows the schedule for weeks three and four, when the test- 
retest and practice-effects studies were being conducted. Each soldier 
from t^e first two weeks reported back for a half day of testing, either in 
the morning (0800) or the afternoon (1300), exactly two weeks after his or 
her week 1 or 2 session. The soldier then completed one-third of all the 
paper-and-pencil tests (a re-test), and completed either the computer 
"practice" session or the entire computer battery (a re-test). 

Sample Description 

If 24 soldiers had appeared for each testing day and completed all 
tests as scheduled, we would have achieved the following sample sizes: 

N - 240 for all cognitive and non-cognitive paper-and-pencil tests 

N - 240 for computer tests 

N - 80 retes of paper-and-pencil tests 
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0800 



0815 



0830 



Rollcall. Divide 24 soldiers into four groups of six each, called 
A, B, C, and D. Overflow soldiers (N>24) ware assigned to Group E. 
(This group's schedule is shown in Figure 2.2). 

Read Introduction 

Read Privacy Act Statement 

Complete Soldier Information Sheet 



Test 
Path Test 
Reasoning Test 1 
Orientation Test 1 
Shapes Test 
Object Rotation Test 



Time Limit 
8 
12 
10 
16 
7.5 



Cognitive 1 Tests (CI) 

Groups B, C, D complete these. 
Group A completes computer tests. 



1030 Reasoning Test 2 10 

Orientation Test 2 10 

Orientation Test 3 12 

Assembling objects Test 16 

Maze Test 5.5 



Cognitive 2 Tests (02) 

Groups A, C, D complete these. 
Group B completes computer tests. 



1315 ABLE 
AVOICE 



50 Non-CognitivG Instruments (NC) 

35 Groups A, B, D complete these. 

Group C completes computer tests. 



1515 Final Sessions: 



Group k takes CI 
Group B takes C2 
Group C takes NC 
Group D takes computer tests 



Figure 2.1. Daily testing schedule for Fort Knox Field Test, Weeks 1 
and 2. 
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Approx Tiat 


j Koon 1 


1 Room 2 




j Room 3 j 


080Q 


1 Assign soldiers to groups: 
1 6 to A« 6 to B, 6 to C, 
1 6 to 0, ovtrf low to E. 

1 N » 24^ 








OdIS 


1 A8C0 for Introduction, Privacy | 
1 Act I Solditr Info. Shtet 

1 M « 24 1 






1 E for Introduction. Privi»cy j 
1 Actf I Soldier Info. Sheet j 

1 'A » overflow, up to 24 j 


1 0830 
1 to 


1 B, C, D takt CI 1 


A takes coo^ter tdsts 




1 E takes CI | 


1 1015 


N « 18 1 


N « 


6 




1 1030 

1 to , 


A, C, D takt C2 1 


fi tak^s computer tests 




E takes C2 | 


1 121S , 


N « 18 1 


N « 


6 




1 131S 1 
I to I 


A, 0 takt HQ 1 


C takes coonputer tests 




E takes NC | 


1 1500 1 


N « 18 1 


N « 


6 




1 1^13 1 
1 to 1 


A takts CI 1 


D takes cooiputer N « 
and 


6 1 


8 takes C2 | 


1 1700 1 


N « 6 1 


C takes NC N « 


6 1 


N « 6 1 



Figure 2.2. Daily location schedule for Fort Knox Field Test, Weeks 1 and 2. 



Daily Schedule for Weeks 3 and 4 | 


Aoorox Time 


Roo"* 1 j 


Room 2 


0800 


Week 1: Morning Group A take 
paper-and-pencil retest* 

N - 6 

) 


Week 1: Morning Group B 
take cotrtputer retest 

N « 6 

/ 


1000 


/ 

Week 1: Morning Group B take 
paper-and-pencil retffst* 

N « 6 


\ 

Week 1: Morning Group A take 
computer practice effects 

N - 6 


1300 


Week 1: Aftenoon Group A 
take paper-and-pencil retest* 

N « 6 


Week 1: Afternoon Group B 
take computer retest 

N « 6 

/ 


1500 


Week 1: Afternoon Group B 
take paper-and-pencil retest* 

N - 6 


Week 1: Afternoon Group A 
take computer practice effects 

N - 6 



*Each paper-and-pencil retest session received one of the following: 
CI, C2, or NC. Groups were cycled through all three in that order 
and the cycle was repeated; i.e., Honday at 0800 is CI, at 1000 is C2, 
at 1300 is NC, at 1500 is CI; Tuesday at 0800 is C2, etc. 



Figure 2.3. Daily schedule for Fort Knox Field Test, Weeks 3 and 4. 
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N - 120 retest of computer tests 



N - 120 practice effects on computer tests 

However, due to the usual exigencies of data collection in the field, 
there was some deviation from these targets. On some days fewer than 24 
soldiers appeared, and on other days more than 24 soldiers appeared. In 
addition, we were able to schedule one additional testing day. Finally, 
some soldiers were unable to complete all the testing due to family or 
other emergencies. Therefore, the following samples were obtained: 

N - 292 completed all cognitive and non-cognitive paper-and-pencil 
tests 

N » 256 completed computer tests 

N » 112-129 completed retest of paper-and-pencil tests (N varied 
across tests) 

N - 113 completed retest of computer tests 

N - 74 completed practice effects on computer tests 

Table 2.9 shows the race and gender makeup for Fort Knox soldie. s 
completing at least part of the Pilot Trial Battery. Table 2.10 shows the 
sample distribution by MOS code. The mean age of the participating sol- 
diers was 21.9 years (SD - 3.1). The mean years in service, computed from 
date of enlistment in the Army, was 1.6 years (SD » 0.9). 



Race and Ge nder of Fort Knox Field Test Sample of the Pilot Trial Battery 



Table 2.9 



Freouencv 



White 

Hispanic 

Black 

American Indian 



156 
24 
121 
2 



Total 



Sex 



Frequency 



Femal e 
Hale 



57 
246 
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Table 2.10 

Mllitarv Occupational Specialties of Fort Knox Field Test Sample 
of the Pilot Trial Battery 



MPS 




MPS 


N 


058 


1 


63N 


s 


1 1 D 


Z 


63T 


3 


lie 


3 


63W 




12B 


16 


63Y 




13B 


14 


64C 


10 


13E 


1 


67G 


1 




19 


710 




19E 


29 


71G 


J 


19K 


10 


71L 


21 


31J 


2 


71M 




31S 


2 


71N 


1 


31V 


3 


72E 




35E 


1 


73C 


\ 


36C 


1 


75B 




36K 


2 


750 




41C 


1 


75F 


1 


4on 


1 


76C 


11 


448 


1 


76P 




44E 


2 


76V 




45B 


1 


76W 


J 


45G 


1 


76Y 


38 




1 


81E 




45N 


8 


82C 


1 


45T 


1 


848 




51B 


3 


918 




51N 


1 


91E 




52D 


1 


92B 




558 


Z 


93 F 




57E 


1 


94B 




628 


2 


94F 




62E 


1 


958 


15 


638 


8 


96B 


630 


1 






63E 


4 






63J 


1 
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Additional Field Testing 



As nrted previously, field tests were conducted at three sites. The 
sites and the basic purpose of the field test at each site were as follows: 

fmiOMX- The full Pilot Trial Battery was administered h'jre, as 
described above. 

Fort Braoo. The non cognitive Pilot Trial Battery neasures, 
Assessment of Background and Life Experiences (ABLE) and Army Vocational 
Interest Career Examination (AVOICE), were administered to soldiers at Fort 
Bragg under several experimental conditions in order to estimate the extent 
to wiiich scores on these instruments could be altered or "faked," when 
persons are instructed to do so. Information on procedures and sample is 
co.<tained in Chapter a. 

Minneapolis r«nnarv Entrance Procgssino station (MEPS). The non- 
cognitive nieasur:« wc.*e administered to a sample of soldiers as they were 
be^ng proces. .d ...tc the Army in order to estimate how persons might alter 
th r scores in an actual applicant setting. Information on procedures and 
sample is contained in Chapter 8. 

Summary 

Thi field test Wc completed in September 1984. Appendix G contains a 
copy of the Pilot Trial Battery as it was administered during the field 
te?ts. 

The remaining chapters in this report describe the development of the 
Pilot Trial Battery measures, the analyses of the pilot test and field test 
data, and the revisions made to the battery based on those an?lyses. 
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CHAPTER 3 

C06MITIVE PAPER-AND-PENCIL MEASURES: PILOT TESTING 

Jody L. Toquan, Harvin 0. Dunnette, VyVy A. Coroe, 
Janis S. Houston, Normari 6. Peterson, Teresa L. Russell, 

and hary Ann Hanson 



GENERAL 

This chapter deals with the cognitive paper-and-pencll measures devel- 
oped for Inclusion In the Pilot Trial Battery. As described In Chapter 1, 
the Task 2 research team. Including contractor personnel. Army Research 
Institute monitors, and designated members of the Scientific Advisory 
Group, had previously evaluated and prioritized cognitive ability con- 
structs or predictor categories according to their relevance and importance 
for predicting success in a variety of Anny MOS (see Figure 1.5). These 
priov'ity ratings were used to plan cognitive paper-and-pencil test develop- 
ment activitl-s. 

Before describing the development of the tests, we outline some issues 
and objectives germane to all the cognitive paper-and-pencil measures. 
Each cognitive predictor category is then discussed in turn. 

Within each category, we' provide a definition of the target cognitive 
ability. Next, for each test developed to measure the target ability, we 
outline the strategy followed; this included identifying (1) the target 
population or target HOS for which the measure is hypothesized to most 
effectively predict success, (2) published tests that served as markers for 
each new measure, (3) Intended level of item difficulty, and (4) type of 
test (f.e., speed, power, or a combination). The test Itself is then 
described and exatnple items are provided. Results from the first two pilot 
test administrations or tryouts are reported to explain and document sub- 
sequent revisions. Finally, psychometric test data from the third pilot 
test, conducted at Fort Lewis, are discussed and the form of the *c»t 
decided upon for field testing is described. 

The last portion of this chapter presents a summary and analysis of 
the newly developed cognitive ability tests. This Includes a discussion of 
test intercorrelations, results from a factor analysis of the intercorrela- 
tions, and results from subgroup analyses of test scores from the pilot 
test at Fort Lewis. Field testing of th^se measures is then described in 
Chapter 



Target PonuU ;ion 

The population for which these tests have been developed is the same 
one to which the Amy applies the ASVAB, that is, persons applying to 
enlist in the Army. This is, speaking very generally, a population made up 
of recent high school graduates, not entering college, from all geographic 
sections of the United States. Non-high-school graduates may be accepted 
into the Anny, but present policy gives preference to high school grad- 
uates. For a number of reasons. Army applicants are probably not a truly 
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random sample of all recent high school graduates, but for initial test 
development activities a highly refined specification of Army applicants 
was not necessary, and was not attempted. 

Another point to be made about the target population is the fact that 
it was, practically speaking, inaccessible to us during our development 
process. We were constrained to use enlisted soldiers to try out the newly 
developed tests. Enlisted soldiers, of course, represent a restricted 
sample of the target population in that they all have passed slistment 
standards; furthermore, almost all of the soldiers that we were able to use 
In our pilot tests had also passed Basic and Advanced Individual Training. 
Thus, the persons in our samples are presumably more qualified, more able, 
more persevering, and so forth, on the average, than are the persons i.i the 
target population. 

The above discission leads up to two major implications that served as 
general guidelines for our development and pilot testing activities: 

(1) The tests to be developed will be applied to a population with a 
large range of abilities. Therefore, we should attempt to de- 
velop tests each of which have a broad range of item difficul- 
ties. Highly peaked tests, in the sense that all items would 
have difficulty levels near a certain value (e.g,, .50, indicat- 
ing that half the examinees would answer correctly), were not our 
goal . 

(2) The-soldiers upon whom the tests will be initially tried out are 
generally higher in ability than the target population. There- 
fore, the tests should be somewhat easier than they would be if 
we had access to an unrestricted sample of the target population 
in trying out the tests. With regard to this point, we point out 
the somewhat confusing nature of the technical term "difficulty 
level." This term is defined as the proportion of persons at- 
tempting an item who answer the item correctly. Thus, a high 
item difficulty level (say .90) means the item is relatively 
easy, whereas a low item difficu>ty level (say .10) means the 
item is relatively hard. When used in reference to an entire 
test, it is usually defined as the proportion of the total number 
of items that are answered correctly, on the average. Thus, a 
test difficulty level of .75 means that, on the average, persons 
taking the test answer 75% of the items correctly. 

Power vs. Soeed 

The above discussion of the t?.rget population show? how we derived 
some general guidelines about the difficulty level of the tests and their 
Items. Another decision to be made about each test was its placemtnt on 
the power vs. speed continuum. This decision is, of course, linked to the 
test difficulty issue, since a relatively easy test can usrally be made 
difficult simply by reducing the time allowed to take the test. 

Very few tests used In practical testing situations are pure power 
tests, but quite a few are highly speeded tests. Most psychometricians 
would agree that a "pure" power test is a tpst administered in such a way 
that all persons taking the test are allowed enough time to attempt all 
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items on the test, and that a "pure" speeded test is a test administered in 
such a way that no one taking the test has f ough time to attempt all of 
the items. J.n practice, there appears to be a power/speed continuum, and 
most tests fall somewhere between the two (tn* ernes on this continuum. It 
also is the case that a power test usually contains items that not all 
persons will be able to answer correctly, even given unlimited time to 
complete the test, wnile a speeded test usually contains items that all or 
almost all persons could answer correctly, given enough time to attempt the 
items. 

As a matter of practical definition for this developmental effort, we 
used an "80% completion" rule-of-thumb to define a power test. That is, if 
a test could be completed by 80 percent of all those taking the test, then 
we considered it a "power" test. Tests with completion rates lower than 
this were considered to have some "speededness" determining performance on 
the test. 

The Pilot Trial Battery contains cognitive ability tests that may be 
considered power tests, and tests that may be categorized as highly speeded 
tests, using the above definition. It also contains tests that may be 
viewed as combinations of both power and speed. Each tost is defined below 
«3 a power, speeded, or combination test according to the development 
strategy employed. 

Reliability 

A fi»:al issue related to evaluation of 'i,?3t construction procedures is 
test reliability. Several procedures are available to assess the reli- 
ability of a measure and each provides distinctive information about a 
test. Internal consistency estimates are used to assess homogeneity of 
test content; high values indicate t'lat test items are measuring the same 
ability or abilities. Test-retest procedures are used to estimate the 
stability o»* test scores across time; high values in^iicate that the test 
yields the same or very similar scores for each subject over time. 

Split-hrif reliability estimates were obtained for each paper-and- 
pencn test administered at the pilot test sites: Fort Carson, Fort 
Campbell, and Fort Lewis. For each tryout» ezch test was administered In 
two separately timed parts. Reliability estimates were obtained by cor- 
relating scores froi the two parts, and the Spearman -Brown correction 
procedure was then used to estimate the reliability for the whole test 
The separately timed, split-half reliability estimates, corrected by the 
Spearman -Brown procedure, are reported for each test. This estimate of 
reliability is appropriate for either speeded or power tests. 

Further, we also report Hoyt internal consistency reliability esti- 
mates for each test. This method provides the average reliability across 
all possible split-test halves. We point out that this procedure is inap- 
propriate for speeded tests because it overestimates the reliability, but 
in the interest of complete reporting the Hoyt reliability estimate has 
been calculated for all tests. 
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Predictor Categories 

He turn now to the description of the tests, which are discussed 
within cognitive ability constructs. The four constructs treated in cogni- 
tive paper-and-pencil tests were spatial visualization, field independence, 
spatial orientation, and induct ion/figural reasoning. 
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SPATIAL VISUALIZATION 

Spatial visualization involves t )e ability to mentally manipulate com- 
ponents of two- or three-dimensional figures into other arrangements. The 
process involves restructuring the components of an object and accurately 
discerning their appropriate appearance in new configurations. This con- 
struct Includes several subcomponents, two of which are: 

0 Rotation - the ability *o identify a two-dimensional figure when seen 
at different angular or.antations within the picture plane. It also 
includes three-dimensional rotation or the ability to identify a 
three-dimensional object projected on a two-dimensional plane, when 
seen at different angular orientations either within the picture plane 
or about the axis in depth. 

0 Scanning - the ability to visually survey a complex field to find a 
particular configuration representing a pathway through the field. 

Visualization constructs had been given a mean validity estimate 
of .21 across all criterion constructs by our expert panel The highest 
mean validity estimate for visualization measures was .25 for criterion 
clusters involving Technical Skills. 

Currently, no ASVAB measures are designed specifically to measure 
spatial abilities. For this reason, spatial visualization received a 
priority rating of one (see Figure 1.5), and development of spatial ability 
measures was strongly emphasized. The visualization construct was divided 
into two areas: visualization/rotation and visualization/scanning. We 
developed two tests to tap ?ljilities within each of these areas; these four 
tests are described below. 

Spatial Visualization - Rotation 

The rotation component of spatial visualization requires the ability 
to mentally restructure or manipulate parts of a two- or three-dimensional 
figure. We developed two tests of this ability. Assembling Objects and 
Object Rotation. The former involves three-dimensional figures, and the 
latter involves two-dimensional objects. 

Assembling Objects Test 

De^elonment Strategy . Predictive validity estimates provided by ex- 
pert raters suggest that measures of the visualization/rotation construct 
would be effective predictors of success in MOS that involve mechanical 
operations (e.g.j inspect and troubleshoot mechanical systems, inspect and 
troubleshoot electrical systems), construction (e.g., construct wooden 
buildings, construct masonry structures), and drawing or using maps. Thus, 



This panel was the group of .tS personnel psychologists who estimated the 
reV^tionships betw< ^n a set of ability constructs and a set of Army cri- 
terio.T constructs. See Chapter 1 of this -report, also Wing, Peterson, 
and Hoffman (1384). 




3-5 



the Assembling Objects test was designed to yield information about the 
potential for success in HOS involving mechanical or construction activi- 

*» • 

Published tests identified as markers^ for Assembling Objects include 
the Employee Aptitude Survey (EAS-5) Space Visualization and the Flanagan 
Industrial Test (FIT) Assembly. EAS 5 requires examinees to count three- 
dimensional objects depicted in two-dimensional space, whereas the FIT 
Assembly involves mentally piecing together objects that are cut apart or 
disassembled. The FIT Assembly was selected as the more appropriate marker 
for our purposes because it h?^ both visualization and rotation components 
for mechanical or construction activities. Thus, we designed the As- 
sembling Objects Test to slss^jss the ability to visualize how an object will 
look when its parts are put together correctly. 

Hultiple-choice test items were constructed to tap this ability at 
several difficuUy levels ranging from very easy items to more difficult 
Items. It was determined that this measure would combine power and speed 
components, with speed receiving greater emphasis. 

Test Development. In the original form of the Assembling Objects 
Test, subjects were asked to complete 30 items within a 16-minute time 
limit. Each item presented subjects with components or parts of an object. 
The task was to select from among four alternatives the one object that de- 
picted the components or parts put together correctly. Two item types were 
included in the test; examples of each are shown in Figure 3.1* 
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Figure 3.1. Sample items from Assembling Objects Test. 



2 

^ As mentioned in Chapter 2, marker tests were published tests that were 
judged to measure the predictor categories on constructs for which we 
were developing tests. Some of these marker tests were actually adminis- 
tered during pilot testing, others rere not, but they were all studied to 
assist in developing the new tests. 
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The 'irst tryout, conducted at Fort Carson, indicated that the test 
nay have suffered from ceiling effects. That is> nearly all recruits in 
this sample (N - 36) completed the test; the mean score was 24.2 (SD - 
5.05). Further, item difficulty levels were somewhat higher than intended 
(mean » .80, SD = .12, r,sdian - .83); that is, the proportion of examinees 
obtaining high scores was greater than expected. 

Therefore, ten new, more difficult items, five for each item tvoe 
e?fpctT'V'i;'*?« f'^'^ *° ^•^^ test to reduce the TikefiKood of celling 
Fort rLhI?? ^^Z'T^ W"^ I'.'"'* retained for the second tryout, at 
i?eL SnflJln - 'en '"JJ?^? = completed the test (Jean 
1;!!^? JJ^^Jf T " mean score was 26.3 {SD = 8.34). 

ifm I'^'^^^Mf^^f "^''^ ^of* f-evised test (mean = .68, SD = 

.ID, median = .72). Inspection of these results indicated that the test 

SSd^ff nnf^"P!f qualities, so no further changes were 

made in preparation for the Fort Lewis pilot test. 

Pilot T?$t Rg?ult?. Fort Lewis results for the Assembling Objects 
Test are shown in Table 3.1. The test contains 40 items with a 16-minute 
time limit; individual test scores were computed using the total number 
correct. The mean number- of items completed was 37.6, with a range of 18 
to 40. Corresponding values for number correct (or test score) were 28 1 
and 7-40. 

Parts 1 and 2 correlate .65 with each other. Reliabilities are esti- 
mated at .79 by split-half methods (Spearman-Brown corrected), and .89 with 
Hoyt's estimate of reliability. 

For the total test, item difficulties (see Figure 3.2) range from .31 
to .92 with a mean of .70. We also computed the correlation of scores on 
each item (0 - incorrect, 1 - correct) with total scores (the number of 
items answered correctly). This index, usually called the item-total 
^urr!l**^°"' """sures the degree to which each item is measuring the same 
ability or abilities as the other items on the test. The higher the value 
of this index, the "better" the item. Values of .25 or better are usually 
considered acceptable, though lower values are not necessarily unaccept- 
able. Item-total correlations for Assembling Objects rr qe from .18 to .60 
with a mean of .44 (SD « 9.99). 

Correlations between scores on this measuve and scores on other Pilot 
Trial Battery paper-and-pencil measures are reported at the end of this 
chapter. It is important, however, to note the correlations between this 
test and its marker tests. Both marker tests were administered in the Fort 
Carson tryout and the FIT Assembly was also used at Fort Campbell. Results 
from Fcrt Tarson indicate that scores on the Assembling Objects Test cor- 
relate .74 with scores on EAS-5 and .7fi with scores on FIT Assembly. 
Results from Fort Campbell indicate that this test correlates .64 with FIT 
Assemoly. This last value represents a better estimate of the relationship 
between Assembling Objects and the FIT Assembly marker, because of the 
revisions made to Assembling Objects following the first tryout at Fort 
Carson. Given the sample sizes involved and the goals for the Assembling 
Objects Test, the .64 correlation was encouraging. 

Modifications for the Fort Knox Field Tp«;t - in preparation for the 
Fort Knox administration, some Assembling Objects items were redrawn to 
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Table 3.1 

Pilot Test Results from Fort Lewis ; Assembllna Objects Test 





Tour 




P3rt 2 


Number cf Items 


40 


20 


20 


Tine Allowed (minutes) 


16 min. 


8 min. 


8 min 


Number of Subjects 


118 


118 


118 


Number of Items Completed 








Mean 


37.58 


18.23 


19.36 


Standard Deviation 


3.83 


2.59 


2.12 


Range 


18-40 


10-20 


6-20 


Last Item Completed by 80% 
of the Sample 


N/A 


16 


20 


Percentage of Subjects 
Completing A11 Iteics 


48% 


56% 


80% 


Number of Items Correct 








Mean 


28.14 


13.86 


14.29 


Standard Deviation 


7.51 


4.18 


4.09 


Range 


7-4G 


3-20 


3-20 


Total -Fart Intercorrelations 








Total 


** 


.91 


.90 


Part 1 






.65 


Part 2 






** 


Split-Half Reliability (Spearman-Brown Corrected) - 


.79 


Hoyt Internal Consistency 






.89 
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Assembling Objects Test 



Number 
of 

Items 
at 
Each 
Difficulty 
Level 



NOTE: 



3<> 
33 
32 
31 
39 
29 
28 
27 
26 
25 
20 
23 
22 
21 
20 
19 
IS 
17 
16 
15 
l«f 
13 
12 
11 
10 
9 

e 

7 
6 
5 
•> 
3 
2 
1 



■■■■a 
■■■■r 



8 mm 



mm 



mmu 
■■■■■■ 

■■■ran 



1 
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■■■^■■■■■■■■■■■1 
■■■■^■■■■■■■nm 



ao .20 



■^^■■■■■■■■■WBMBa«ii5 



. 30 AO ;50 .60 /TO :80 Igo" 

Item Difficulty Level (Prooortion Passing) 



U)0 



Mean 



.70 



SD - .16 
Range « .31 - .92 
Hurrber of items in the test « 40. 



Figure 3.2. Distribution of item difficulty levels: Assembling Objects 
Test. 



clarify the figures. The item response format was modified to a form that 
could be used for machine scoring (i.e., the subject was instructed to fill 
in a circle for the correct answer). This change was made in all of the 
tests being prepared for field test administration. 

Object Rotation Test 

Development Strategy . Object Rotation is the second test developed to 
measure spatial visualization/rotation. This measure is also expected to 
predict success in MOS involving mechanical operations, construction activ- 
ities, and drawing or using maps. 

Published tests serving as markers for this measure include Educa- 
tional Testing Services (ETS) Card Rotations, Thurstone*s Flags Test, and 
the Shephard-Metzler Mental Rotations. Each of these measures requires the 
subject to compare a test object with a standard object to determine whe- 
ther the two represent the same figure with one simply turned or rotated or 
whether the two represent different figures. The first two measures, ETS 
Card Rotations and Thurstone*s Flags, involve visualizing two-dimensional 
rotation of an object, whereas the Mental Rotations test requires visualiz- 
ing three-dimensional objects depicted in two-dimensional space. 

Object Rotation Test items were constructed to reflect a limited range 
of item difficulty levels ranging from very easy to moderately easy. These 
items, on the average, were designed to be easier than those appearing in 
the Assembling Objects Test. Further, we planned to construct a test that 
contains more, items and has a shorter time limit than the Assembling Ob- 
jects Test. Thus, the plan for Object Rotation was to develop a test that 
falls more toward the speeded end! of the power-speed continuum. 

Test Develooment . As initially developed, the Object Rotation Test 
contained 60 items with a 7-minute time limit. The subject's task was to 
examine a test object and determine whether the figure represented in each 
item is the f;ame as the test object, only rotated, or is not the same as 
the test object (e.g., is flipped over). For each test object there are 
five test -^tems, each requiring a response of "same" or "not same." Sample 
test items a^e shown in Figure 3.3. 

The Fort Carson tryout indicated that this test suffered from ceiling 
effects. Subjects (N - 38), on the average, completed 59.3 (SD - 2.60) of 
the 60 items and obtained a mean score of 55.6 (SD » 6.06). Item diffi- 
culty levels averaged .92 (SD » .05). Consequently, we decided to add 30 
new items to the test and to increase the time limit to 9 minutes for the 
secono tryout at Fort Campbell. 

In the second tryout, subjects, on the average, completed 87.6 
(SD « 7.96) of the 90 items and obtained a mean score of 77.0 (SD « 12.1). 
The time limit was reduced to 8 minutes for the Fort Lewis administration, 
in order to obtain a more highly speeded test. 

Pilot Test Results . Detailed results from the Fort Lewis pilot test 
are shown in Table 3.2. As reported in the table, completion rates were 
fairly high (mean - 82.6), with a range of 48 to 90. Tost scores, computed 
by the total number correct, range from 36 to 90 with a mean of 73.4. 
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Figure 3.3. Sample Test items from Object Rotati 



ons Test. 
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Table 3.2 

Pilot Test Results from Fort Lewis: Object Rotation Test 



Number of Items 

Time Allowed (minutes) 

Number of Subjects 

Number of Items Completed 
Mean 

Standard Deviation 
Range 

Last Item Completed by 00% 
of the Sample 

Percentage of Subjects 
Completing All Item? 

Number of Items Correct 
Mean 

Standard Deviation 
Range 

Total -Part Intercorrelations 
Total 
Part 1 
Part 2 

Split-Half Reliability {S| 
Hoyt Internal Consistency 



Total Part 1 Part 2 

90 45 45 

8 4 4 

118 118 118 

82.64 40.52 42.12 

10.79 6.73 r).56 

48-90 21-45 18-45 

N/A 35 40 

52% 50% 67% 

73.36 36.64 36.72 

15.40 8.69 7.77 

36-90 13-45 7-45 



** .94 .93 

** .75 
** 

arman-Brown Corrected) » .85 

- .96 
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Object Rotation Test 



Number 
of 

Items 
at 

Each 



Level 
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Item Difficulty Level (Proportion Passing) 



Mean 



•82 



SD - •!! 
Range « • 



NOTE: Number of items in the test « 90 • 
Figure 3.4^ Distribution of item difficulty levels: 
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Item difficulty levals (see Figure 3.4) range from .59 to .98 with a 
mean of .81. Item-total correlations averaged .44 (SD « .17), ranging 
from .09 to .79. Parts 1 and 2 correlated .75 with each other. The split- 
half reliability estimate, corrected for test length, is .86 while the Hoyt 
estimate is .96. 

The marker test for Object Rotation, Mental Rotations, was adminis- 
tered at two of the three pilot test sites. Data collected at the Fort 
Carson tryout indicate that the two measures correlate .60 (N > 30); 
data from the Fort Lewis administration indicate the two correlate .56 
(N " 118). This was viewed as an acceptable level of relationship. 

Modifications for the Fort Knox Field Test . Results from the Fort 
Lewis pilot test indicated that the Object Rotation Test items possessed 
desirable psychometric properties. Number of items completed, item diffi- 
culties, and item- total correlations were nearly all acceptable « However, 
the time limit was decreased to 7 1/2 minutes to make the test more speeded 
tnd avoid a possible ceiling effect. Also, as noted earlier, the response 
format was modified to one that could be used for machine scoring. 

Soatial Visualization - Scanning 

A second component of spatial visualization ability which was em- 
phasized in predictor development is spatial scanning. Spatial scanning 
tasks require the subject to visually survey a complex field and find a 
pathway through it, utilizing a particular configuration. The Path Test 
and the Maze Test were developed to measure this component of spatial 
visualization. 

Path Test 

Development Strategy. Validity estimates provided by the expert 
rating panel suggested that a measure of visualization/scanning would be 
most effective in predicting success for Army MOS involving electrical or 
electronic operations (e.g., troubleshooting electrical systems, inspecting 
and troubleshooting electronic systems), using maps in the field (e.g., 
planning placement of tactical positions), and controlling air traffic. 

Published tests serving as markers for construction of the Path Test 
include Educational Testing Service's Map Planning and Choosing a Path. In 
these measures, examinees are provided with a map or diagram. The task is 
to follow a given set of rules or directions to proceed through the pathway 
or to locate an object on the map. 

Results from the Preliminary Battery research with the marker tests, 
ETS Map Planning and ETS Choosing a Path, indicated that both tests are 
highly speeded and were very difficult for the target sample (Hough, 
Dunnette, Wing, Houston, & Peterson, 1984). For example, 80 percent of the 
subjects (N « 1,843 Army recruits) completed only 16 of the 40 items 
contained in the Map Planning test. The mean score for this group was 18.1 
(SD » 16.5). For Choosing a Path, 80 percent of the subjects completed 
only six of tho 16 items. This group obtained a mean score of 4.96 
(SD - 3.35). 
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These data suggested that the Path Test should contain items somewhat 
less difficult than the ETS tests or provide more time for completion of 
items at a similar difficulty level. Consequently, Path Test items were 
constructed to yield difficulty levels for the target population ranging 
from very easy to somewhat difficult, and the test time was established to 
place more emphasis on speed than on power. 

Test Development. The Path Test requires subjects to determine the 
best path or route between two points. Subjects are presented with a map 
of airline routes or flight paths. Figure 3.5 shows a flight path with 
four sample items. The subject's task is to find the "best" path-that is, 
the path between two points that requires the fewest stops. Each lettered 
dot is a city that counts as one stop; the beginning and ending cities 
(dots) do not count as stops. 

In its original form, the Path Test contained 35 items with a 9 -minute 
time liiiit. Subjects were asked to record the numbers of stops for each 
item in the corresponding blank space. (The response format appearing in 
Figure 3.5 is from the final version of the Path Test.) The first version 
contained three maps or airline routes with 13, 9, and 13 items, respec- 
tively. 




Figure 3.5. Sample items from Path Test. 



Pasults from the first tryrut, conducted at Fort Carson, revealed that 
the test was too easy. Virtually all of the subjects completed the test 
(mean - 34.1, SD - 2.51, N - 2, } and the mean score was 29.9 (SD - 4.08). 
Item difficulty levels ranged from .48 to 1.00 with a mean of .85 
(SD - .12). 
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To reduce the potential for celling effects, an additional aap or 
flight path with 13 Iten'i was added to the test. Also, four very easy 
Itens (i.e., difficulty levels ranging from .90 to KOO) were deleted, 
resulting in 44 Items on the revised test. The 9*m1nute time limit was 
retained. In the second tryout subjects completed an average of 40.7 Itenis 
(SD - 5.07) and obtained a mean score of 32.6 (SD - 7.00). Item difficulty 
levels ranged from .55 to .96 with a mean of .80. Those results Indicated 
that the changes had largely achieved the goal of making the test more 
difficult. 

To prepare for the pilot test conducted at Fort Lewis, the test re- 
sponse format was revised to allow subjects to circle the number of stops 
(I.e., 1-5) to avoid having to process written-in responses. In addition, 
the tims limit was reduced from 9 minutes to 8 minutes to Increase the 
speededness of the test. 

Pilot Test Results . Path Test results obtained from the Fort Lewis 
tryout are reported In Table 3.3. Subjects, on the average, completed 35.3 
of the 44 Items, with a range of 0 to 44. Test scores, computed by the 
total number correct, ranged from 0 to 44 with a mean of 28.3. 

Item difficulty levels (see Figure 3.6) ranged from .20 to .91 with a 
mean of .64). Item-total correlations averaged .47 (SD > 11) with a range 
of .25 to .69. Parts 1 and 2 correlate .70. The split-half reliability 
estimate, corrected for test length. Is .82. The Hoyt Internal consistency 
value Is .92. These results Indicated that the test is generally in excel- 
lent shape. . 

Both marker tests were administered at the first tryout, and the ETS 
Hap Planning Test was also administered at the Fort Campbell and Fort Lewis 
tryouts. Data from the first tryout indicate that the original Path Test 
correlates .34 with ETS Choosing a Path and -.01 with ETS Hap Planning. 
The reader is reminded that results from Fort Carson are based on a very 
small sample size (N « 19) and that the Path Test was modified greatly 
following this tryout. Data from the final two tryouts indicate that the 
Path Test and Hap Planning correlate .62 (N - 54) and .48 (N - 118), 
respectively. Although these values are not as high as marker test cor- 
relations for some of the other new tests, this was expected. Recall that 
the marker tests were known to be too difficult for t^e typical Army sample 
and we set out to make the new tests easier than the marker tests. 

Hodifications for the Fort Knox Field Test . The Path Test remained 
unchanged for the field test except for the modification in response for- 
mat. 

Haze Test 

Development Strategy . The Haze Test represents the second measure 
constructed to assess spatial visualization/scanning. As with the Path 
Test, the expert panel of judgea indicated that this measure would be most 
effective in predicting success for HQS involving electrical and electronic 
operations, using maps in the field, and controlling air traffic. 

The development strategy for this test ,nirrors that of the Path Test-- 
markers for the Maze Test again included ETS Hap Plannir.^^ and ETS Choosing 
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Table 3.3 

Pilot Test Results from Fort Lewis; Path Test 





Total 


Part 1 




Number of Items 


44 


22 




Time Allowed (minutes) 


8 


4 


A 


Number of Subjects 


116 


116 


116 


Number of Items Completed 








Mean 


35.33 


16.63 


18.70 


Standard Deviation 


8.27 


4.58 


4.25 


Range 


0-44 


0-22 


0-22 


Last Item Completed by 80% 
of the Sample 


N/A 


13 


15 


Percentage of Subjects 
Completing All Items 


19% 


23% 


42% 


Number of Items Correct 








Mean 


28.28 


13.41 


14 87 


^uciiiuciiu Ucviduiun 


O AO 
7.08 


4.93 


4.91 


Ran'^e 


0-44 


0-22 


0-22 


Total -Part Intercorrelations 








Total 


ie-k 


.92 


.92 


Part 1 




** 


.70 



Part 2 



Split-Half Reliability (Spearman -Br own Corrected) - .82 
Hoyt Internal Consistency » .92 
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Figure 3^6^ Distribution of item difficulty levels: Path Test* 
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a Path. As with the Path Test, this test was designed to include items 
geared more toward the ability level of the Project A target population 
than populations for the two marker tests, that is, somewhat easier items 
were appropriate for the Maze Test. 

However, the Maze Test differs from the Path Test in several ways. 
The task required in the Maze Test involves finding the one pathway that 
allows exit from a maze. Items for the Haze Test were constructed to be 
Buch easier under nonspeeded conditions than in the Path Test, and greater 
emphasis was placed on speed. The Maze Test, then, was designed to measure 
visualization/scanning ability under highly speeded conditions. 

T$?t DevglQpmgnt. For the first tryout the Maze Test contained 24 
rectangular mazes. Each included four entrance points labeled A, B, C, and 
D, and three exit points indicated by an asterisk (*). The task is to 
determine which of the four entrances leads to a pathway through the maze 
and to one of the exit points. A 9-minute time limit was established for 
this test. 

Results from the first tryout, at Fort Carson, indicate that the 
original version of the Maze Test suffered from ceiling effects. Subjects 
score of 2r6^(SD^^%^75) " ^-^^^ 24 items and obtained a mean 

To Increase test score variance, the test was modified in two ways. 
First, an additional exit was added to each test maze; Figure 3.7 shows a 
sample item from the original test and the same item modified for the Fort 
Campbell tryout. Second, the time limit was reduced from 9 to 8 minutes. 

cn tryout, completion rates were again high (mean - 22.5, 

SD - 2.49, N - 56). Consequently, for the third tryout, the time limit for 
completing the 24 maze items was dropped to 6 minutes. 

Pilot Tg?t Re^Mlti?. Results from the Fort Lewis administration are 
reported m Table 3.4. These data indicate that the reduced time limit 
produced a drop in the completion rate for the Fort Lewis sample (mean - 

l.J^^^ scores, computed by the total number correct, ranged from 8 to 
24 with a mean of 19.3. » » « 

l^^W^^il'^^^i IT^^ V^'^''^ 3.8) range from .41 to .98 with a 
!l SI ; correlations average .48 (SD « .22) with a range 

hli'y i^^u'^^'l P^r^s 1 and 2 correlate .64 with each other. The split- 
^f ^^^5^ estimate corrected for test length is .78 and the Hoyt 

JnJ ca{I'JLrJh!'J^ ^- ^^l^ * ^'^'ol^' these results 

indicate that the test is in good shape. 

One or both of the marker tests, ETS Choosing a Path and ETS Map 
Planning, were administered at the three pilot test sites. Results from 
Fort Carson indicate that the Maze Test correlates .24 (N - 29) with 
Choosing a Path and .36 (N - 30) with Map Planning. These values must be 
viewed with caution because of the small sample size and because of modifi- 
cations made to the Maze Test following this tryout. Map Planning was also 
administered at the Fort Campbell and Fort Lewis tryouts. Data collected 



3-19 



EMC 




Figure 3.7. Sample items for the Maze Test. 
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Table 3.4 

Pilot Test Results from Fort Lewis: Maze Test 





Tntal 


Part 1 


Part : 


Number of Items 


24 


12 


12 


Time Allowed (minutes) 


6 


3 


3 


Number of Subjects 


118 


118 


118 


Number of Items Completed 








Mean 


20.65 


10.44 


10.21 


Standard Deviation 


3.88 


2.18 


2.19 








4-iZ 


Last Item Completed by 80% 
of the Sample 


N/A 


9 


8 


Percentage of Subjects 
Completing All Items 


38% 


57% 


50% 


Number of Items Correct 








Mean 


19.30 


9.95 


9.35 


Standard Deviation 


4.35 


2.48 


2.32 


Range 


8-24 


2-12 


4-12 


Total -Part Intercorrelations 








Total 


** 


.91 


.90 


Part 1 




** 


.64 


Part 2 






** 


Split-Half Reliability (Speannan-Brown Corrected) = 


.78 


Hoyt Internal Consistency 






.88 
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Haze Test 





3<» 
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17 
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16 


of 


is 


Items 


m 


at 


13 


Each 


12 


Difficulty 


11 


Level 


10 




9 




8 




7 
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NOTE: 



Item Difficulty Level (Proportion Passing) 
Mean - .80 
SD - .18 
Range - .41 - .98 
Number of items in the test - 24. 



Figure 3.8. Distribution of item difficulty levels: Maze Test. 
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at these posts indicate that it correlates .45 (N = 55) and .63 (N = 118), 
respectively, with the revised Maze Test. This last correlation was viewed 
as acceptable. 

Modifications for the Fort Knox Field Test . Results from the last 
pilot test administration showed that the Maze Test could be slightly more 
speeded. The percentage of subjects completing this tv«»st was higher than 
for the Path Test (38% for the Maze Test, and 19% for t^ie Path). There- 
fore, the time limit was reduced from 6 minutes to 5 1/2 minutes for the 
Fort Knox field test. 
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FIELD INDEPENDENCE 



This construct involves the ability to find a simple form when it is 
hidden in a complex pattern. Given a visual percept or configuration, 
field independence refers to the ability to hold the percept or configura- 
tion in mind so as to disembed it from other well-defined perceptual 
material . 

This construct received a mean validity estimate of .30 from the panel 
of expert judges-, with the highest estimate of .37 appearing for MOS that 
involve detecting and identifying targets. Field Independence received a 
priority rating of two for inclusion in the battery. 

Shanes Test 

Development Strategy . According to the expert panel of judges, a 
measure of field independence most effectively predicts success for MOS 
that involve detecting and identifying targets, using maps in the field, 
planning placement of tactical position, controlling air traffic, and 
troubleshooting operating systems such as mechanical, electrical, fluid, 
and electronic systems. 

The marker. test for the Shapes Test is the Educational Testing Ser- 
vice's Hidden Figures Test, a measure included in the Preliminary Battery 
(Hough, al., 1984K In this test, subjects are asked to find one of 
five simple figures locat''! in a more complex pattern. Initial analyses of 
the Preliminary Battery indicated that for the target population of first- 
term enlisted soldiers, the Hidden Figures Test suffers from limited test 
score variance, and possibly floor effects. For example, the initial data 
indicate that 80 percent of the sample completed fewer than 4 of the 16 
test items. The mean test score was, therefore, very low (mean - 5.16, SD 
-3.35). 

Our strategy for constructing the Shapes Test, then, was to use a task 
similar to that in the Hidden Figures Test while ensuring that the diffi- 
culty level of test items was geared more toward the Project A target 
population. Further, we decided to include more types of items than appear 
in the Hidden Figures Test and to construct items that reflect varying 
difficulty levels ranging from easy to moderately difficult. We wanted the 
test to be speeded, but not nearly so much so as the ETS Hidden Figures 
Test. 

Test Development. At the top of each test page are five simple 
shapes; below these shapes are six complex figures. Subjects are instructed to 
examine the simple shapes and then to find the one simple shape located in 
each complex figure. (See Figure 3.9.) 

In the first tryout, at Fort Carson, the Shapes Test contained 54 
items with a 16-minute time limit. Results from this tryout indicated that 
most subjects were able to complete the entire tesK (e.g., mean completed - 
53.4, SD - 1.53), and most subjects obtained very high scores (mean score ■ 
49.3, SD - 4.17). Item difficulty levels also suggested that this test was 
very easy and suffered from. ceiling effects (mean item difficulty lev- 
el - .91, SO - .13, median - .97). 
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Figure 3.9. Sample items from the Shapes Test. 
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To prepare for the Fort Campbell tryout, nearly all test Iteiiis were 
modified to Increase It^ro difficulty levels. Examples of Item modifica- 
tions are provided In Figure 3*9. As Is shown, by adding a few lines to 
each complex pattern, the test Items administered at Fort Campbell tryout 
were made more difficult than the Items administered at Fort Carson. 

Results from Fort Campbell Indicate that test Item modifications were 
^Mccessfal. Subjects, ca the average, completed 43.5 (SD > 3.79) of the 54 

3ms Mlth^n the I6-m1nute time limit, and obtained a mean score of 30.7 
iSD « 23.5, and median difficulty level « .67). 

This test was modified only slightly for the Fort Lewi* administra- 
tion. For example, a few complex figures Inadvertently concained more than 
one simple figure. (This was revealed In the Item analyses.) These Items 
were revised to ensure that no more than one simple figure could be located 
In each complex figure. The Shapes Test administered to the Fort Lewis 
sample contained 54 Items with a 6-m1nute time limit. 

Pilot Test Results , Table 3.5 contains Fort Lewis results from the 
Shapes Test. Nean number completed is 42.4. The mean number correct for 
this S2*nple Is 29.3 with a range of 12 to 51, Indicating that the measure 
does not suffer from celling effects. 

Item difficulty levels (see Figure 3.10) range from .10 to .97 with a 
mean of 54.2 (SD « 24.55). (See Figure 3.10.) Item- total correlations 
range from .07 to .57 with a mean of .39 (SD » .13). Reliability estimates 
Indicate that- Parts 1 and 2 correlate .69; with the Spearman-Brown correc- 
tion, this value Is .82. The Hoyt reliability estimate for this test 
Is .89. As a whole, these results show the test to be In good shape. 

The marker test, ETS Hidden Figures Test, was administered at the 
first two tryouts. Rnsults from Fort Carson Indicate that the original 
version of the Shapes Test correlated .35 with the Hidden Figures Test (N » 
29). Data from Fort Campbell Indicate that the revised Shapes Test cor- 
relates .50 with Its marker (N « 56). Although a bit lower than desirable, 
this was not unexpected because of the planned differences in difficulties 
of the two tests. 

Modifications for the Fort Knox Field Test , The Shapes Test needed 
only minor revisions for the field test, for example, item-total correla- 
tions for a few It ms Indicated that more than one shape could still be 
located In a complex figure test Item, so these figures were modified. 
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Table 3.5 

^ilot Test Results f rom Fort Lewis; Shapes Test 





Total 


Part 1 


Part 2 


NumDer of Items 


54 


27 


27 


lime Allowed vminuxesj 


16 


8 


8 


Number of Subjects 


118 


118 


118 


Number of Items Completed 








Mean 


42.42 


20.78 


21.64 


Standard Deviation 


9.29 


5.14 


5.05 


Range 


17-54 


8-27 


8-27 


Last Item tompietea oy ou% 
of the Sample 


1 

• 

N/A 


16 


17 


Percentage of Subjects 
Completing All Items 


12% 


24% 


23% 


ftumDer ot Items torrect 








. neon 




14.49 


14,79 


otanaara ueviaiion 


9.14 


b.03 


4.92 


Range 


12-51 


5-26 


4-25 


Total -Part Intercorrelatlons 








Total 




.92 


.92 


Part 1 




** 


.69 


Part ?. 






** 



Split-Half Reliability (Spearman -Brown Corrected) » .82 
Hoyt Internal Consistency » .89 
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Shapes Test 



Number 
of 

Items 
at 
Each 
Difficulty 
Level 



3i» 
33 
32 
31 
30 
29 
28 
27 
26 
25 
2<» 
23 
22 
21 
20 
19 
18 
17 
16 
15 

m 

13 
12 
11 
10 
9 
8 
7 
6 
5 
H 
3 
2 
1 
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NOTE: 



Item Difficulty Level (Proportion Passing) 
Mean « .54 
SD - .25 
Range « .10 - .97 
Number of items in the test « 54. 



Figure 3.10. Distribution of item difficulty levels: Shapes Test. 
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SPATIAL ORIENTATION 



This construct involves the ability to maintain one's bearings with 
respect to points on a compass and to maintain appreciation of one's loca- 
tion relative to landmarks in the environment. 

This partial ar construct was not included in the list of predictor 
constructs evaluated by the expert panel. The rationale for developing 
■easures of spatial orientation for inclusion in the Pilot Trial Battery is 
described below. 

Conceptualization and measurement of this ability construct first 
appeared during World War II, when researchers for the Army Air Force (AAF) 
Aviation Psychology Program explored a variety of constructs to aid in the 
selection of air crew personnel. Spatial orientation measures were de- 
signed to predict success in air crew positions that required familiarity 
with points on a compass, the ability to apprehend directions quickly and 
accurately, and the ability to remain directionally oriented in spite of 
sudden and frequent changes in direction. Results from the AAF Program 
indicated that measures of spatial orientation were useful in selecting 
pilots and navigators (Guilford & Lacey, 1947).^ 

During the second year of Project A, several Task 2 personnel .from 
PDRI had the opportunity to observe recruits performing on the job^. These 
Job observations included soldiers from a variety of MOS, such as adminis- 
trative specialists, cannon crewmen, armor crewmen, radio and teletype 
operators, light wheel vehicle/power generator equipment mechanics, in- 
fantryiiien, military police, and MANPADS personnel. Information collected 
during these job observations suggested that some MOS involve critical job 
requirem3nts of maintaining directional orientation and establishing loca- 
tion using features or landmarks in the environment. For example, armor or 
tank crewmen when performing in the field must be able to reorient them- 
selves quickly as the tank turret turns or rotates; MANPADS personnel need 
to establish their location in the field, relative to the location of 
friendly and enemy troops, using features or landmarks in the environment. 

Information obtained from these job observations was reported, in 
part, at the March 1984 Task 2 IPR. Participants in this meeting agreed 
that measures of spatial orientation would be useful in predicting perfor- 
mance in Army MOS that require orientation abilities if a soldier is to be 
successful on the job. Three measures were developed for this construct. 

Orientation Test 1 

Development Strategy . As reported above, information collected during 



Dr. Lloyd Humphreys, of the Scientific Advisory Group for Project A, 
particularly emphasized the usefulness of this construct to us. 

Dr. Jay Uhlaner, also of SAG, originally suggested that job observation 
sessions would be especially helpful at this stage of the research, which 
indeed proved to be the case. 



3-29 



Job observations suggested that a measure of spatial orientation would be 
most effective In predicting success for HOS that Include such critical job 
requirements as Identifying tactical positions, determining location of 
friendly and enemy troops, and using features or landmarks in the environ- 
nent to establish and maintain one's bearings. 

Paper -and -pencil measures that tap this ability were developed by re- 
searchers in the U.S. Army Air Force's Aviation Psychology Program. Di- 
rection Orientation Form B (CP515B) served as the marker for Orientation 
Test 1. The strategy for developing Orientation 1 involved generating 
items that duplicated the task in the Army Air Force's test. Each item 
contained six circles. The first, the standard compass or "given" circle, 
indicates the direction of North and usually is rotated out of the conven- 
tional position. The remaining circles are test compasses that also have 
directions marked on them. 

For this test, item construction was limited to one of seven possible 
directions: South, East, West, Southwest, Northwest, Southeast, and North- 
east. Thus, item difficulty levels were not expected to vary greatly. 
(Off -quadrant directional items such as Northwest or Southeast were, how- 
ever, viewed as more difficult than South, East, or West directional 
items.) Our plan for this test was to ask subjects to complete numerous 
compass directional items within a short period of time. Orientation 1, 
then, was designed as a highly speeded test of spatial orientation. 

Test Development. In its original form, each test item presented 
subjects with. six circles. The first, the Given Circle, indicated the 
compass direction for North. For most items. North was rotated out of its 
conventional position (i.e., the top of the circle did not necessarily 
represent North). Compass directions also appeared on the remaining five 
circles. The subject's task was to determine, for each circle, whether or 
not the direction indicated was correctly positioned by comparing it to the 
direction of North ;n the Given Circle. (See Example 1 in Figure 3.11.) 

When administered to the Fort Carson sample, this test contained 20 
item sets requiring 100 responses (i.e., for every item, compass directions 
on five circles must be evaluated). Subjects were given 8 minutes to 
complete the test. Test scores were determined by the total number cor- 
rect; the maximum possible was 100. 

Results from this first tryout showed that nearly all subjects com- 
pleted the items within the time allotted (mean completed was 18.6 out of 
the 20 sets of items); they obtained a mean score of 82.7 (SD - 17.9). 
Item difficulty levels indicate that most items were moderately easy 
(mean - 82.7, SD - 11.1). 

Thus, for the Fort Campbell tryout, we attempted to create more dif- 
ficult items by modifying directional information provided in the Given 
Circle. That is, rather than indicating the direction for North, compass 
directions for South, East, or West were provided. These directions were 
also rotated out of conventional compass position. (See Example 2, Figure 
3 • 1 1 • } 
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Figure "^.ll. Sample items from Orientation Test 1. 
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Orientation Test 1, as administered at the Fort Campbell tryout, 
contained 30 Item sets (150 Itehis). It was administered In three separate- 
ly timed parts. Parts One and Two Included the original test Items, where- 
as Part Three included the new (non-North) Items. This last part of the 
test was preceded by additional test Instructions that Informed subjects 
about the change In Given Circle directions. Subjects were given 3 minutes 
to complete each part, for a total of 9 minutes. 

Results from this second tryout Indicate that for the total test, 
subjects completed 23.5 of the 30 Item sets (or 117.10 Items) and obtained 
a mean score of 100.8 (SD • 24.0). Scores on Part Three yielded lower 
correlations with Parts One and Two (both are .44); Parts One and Two 
correlated .87. From this Information we reasoned that the new Items were 
assessing additional Information about subjects' abilities to maintain 
orientation. . 

We then mixed Item sets from Part Three with Item sets from Parts One 
and Two to create a test with 30 Item sets (150 Items) for the Fort Lewis 
tryout. The time limit was Increased to a total of 10 minutes, and test 
Instructions were modified to explain that Items vary throughout the test 
with respect to Information provided In the Given Circle. Again, test 
score was determined by the number of Items correct (maximum score Is 150). 

Pilot Test Results . Results from the Fort Lewis pilot tsst are re- 
ported In Table 3.6. Completion rates/for the total test Indicated that, 
on the average, subjects attempted 25 of the 30 Item sets (or 125.7 of 150 
Items) and obtained a mean score of 117.9 (SD • 24.2). 

Item difficulty levels (see Figure 3.12) ranged from .21 to .97 with a 
mean of .79. Item-total correlations are at acceptable levels (mean - .43, 
SD " .14). The correlation between Parts One and Two Is .86. Reliability 
estimates are as follows: Split-half Spearman-Brown corrected - .92, 
Hoyt.* .97. These results Indicate that the test was performing as 
Intended. 

No marker tests for this construct were Included In any of the three 
pilot test administrations. However, two other new measures of spatial 
orientation (Orientation 2 and Orientation 3) were developed for the Pilot 
Trial Battery and correlations between Orientation 1 and these other new 
tests were obtained. (These new tests are described below.) From the Fort 
Carson data. Orientation 1 correlated .40 with Orientation 2 (N - 30) 
and .66 with Orientation 3 (N - 25). Results from Fort Campbell Indicate 
that Orientation 1 correlated .45 with Orientation 2 and .72 with Orienta- 
tion 3 (N - 56). Finally, for the Fort Lewis sample, these same measures 
correlated .53 and .68, respectively (N » 118). These results were viewed 
as Indicating that Orientation 1 was tapping the appropriate constructs, 
but was not redundant with the other new tests. 

Modifications for the Fort Knox Field Test . Very few changes were 
made on this test; for example, one Item was "cleaned up" to avoid confu- 
sion about the compass direction provided on the Given Circle. The field 
test version of Orientation Test 1 contained 30 Item sets (ISO 1te;ns) with 
a 10-m1nute time limit. 
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Table 3.6 

Pilot Test Results from Fort Lewi s; Orientation Test 1 



Number of Items 

Time Allowed (minutes) 

Number of Subjects 

Number of Items Completed 
Mean 

Standard Deviation 
Range 



Last Item Completed by 80% 
of the Sample 

Percentage of Subjects 
Completing All Items 



Number of Items Correct 
Mean 

Standard Deviation 
Range 

Total -Part Intercorrelations 
Total 
Part 1 
Part 2 

Split-Half Reliability (Sj 
Hoyt Internal Consistency 



Mai 
30(150) 
10 
118 



Part 1 
15(75) 
5 

118 



Part 2 
15(75) 
5 

118 



25.14(125.7) 11.75(58.75) 13.39(66.95) 
4.88 2.96 2.35 

12-30(60-150) 5-15(25-75) 5-15(25-75) 



N/A 



9(45) 



12(60) 



31% 


32% 


55% 


117.86 


56.50 


61.36 


24.16 


12.28 


12.80 


46-150 


25-75 


21-75 


** 


.96 


.96 






.89 








an-Brown Corrected) - 


.92 



.97 
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Orientation lest 1 
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Figure 3.12. Distribution of item difficulty levels: Orientation Test 1, 
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Orientation Test 2 



Development Strategy. The second measure of spatial orientation was 
also designed to tap abilities that might predict success for MOS that 
Involve maintaining appreciation of one's location relative to landmarks in 
the environment or in spite of frequent changes in direction. Orientation 
Test 2 is a relatively new approach to assessing spatial orientation abili- 
ties. 

Although no particular test served as its model, it is similar to a 
measure designed by Army Air Force researchers to select pilots, naviga- 
tors, and bombardiers (Directional Orientation: CP5150). Items in the AAF 
test consist of two aerial photographs of the same landscape. On the first 
•photograph, a compass is indicated. The second photograph is rotated 
relative to the first photograph and contains an arrow, again indicating 
direction. Subjects must determine in which direction the arrow in the 
second picture is pointed, based on the compass direction given in the 
first photograph and the degree of rotation of the second photograph. 
Thus, the AAF test measures the ability to maintain one's perspective with 
regard to the directional relationships of several objects (e.g., the first 
aerial photograph) when the objects have been rotated (e.g., the second 
aerial photograph). 

The task we designed for Orientation Test 2 asks subjects to mentally 
rotate objects and then to visualize how components or parts of those 
objects will appear after the object is rotated. Item difficulty levels 
were v?iried by altering the degree of rotation required to correctly com- 
plete each part of the task. Because of the complexity of the task. 
Orientation 2 was initially viewed as a power test of spatial orientation. 

Test Development. For Orientation Test 2, we chose to design a task 
involving common objects. Each item contains a picture within a circular 
or rectangular frame. At the bottom of the frame is a circle with a dot 
Inside it. The picture or scene is not in an upright position. The task 
Is to mentally rotate the frame so that the bottom of the frame is posi- 
tioned at the bottom of the picture; after doing so, one must then deter- 
mine where the dot will appear in the circle. (See Figure 3."» for sample 
items.) For the Fort Carson tryout, this test contained 20 items with an 
8-minute time limit. 

Results from this administration indicate that the time limit was 
sufficient (mean number completed » 19.9, SD • 4.55). Item difficulty 
levels were somewhat lower than desired (mean - .52, SD » .16). Item- total 
correlations were, however, impressive (mean • .48, SD • .10). The only 
potential problem with this measure involved the test instructions as some 
subjects required additional instructions to understand what was going on. 
Therefore, for the Fort Campbell tryout, test instructions were modified to 
clarify the task. 

Data collected at Fort Campbell provide very similar information about 
this test. For example, nearly all subjects completed this test (mean » 
19.7, SD « .71). Item-total correlations were again impressive (mean 
- .46, SD - .13). The mean score and item difficulty levels indicated that 
the test was more difficult for this group than for the Fort Carson sample 
(mean score - 8.61, SD - 4.49; mean item difficulty - .43, SD » .11). 
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Figure 3.13. Sample items from Orientation Test 2. 
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Because of these item difficulty levels, we decided to add four new 
test items, constructed using item difficulty information obtained for the 
Fort Campbell sample. That is, items were examined to identify what ap- 
peared to make them more or less difficult, and new, easier items were 
written using this information. Primarily, this involved constructing 
Items so that rotations of 90, 180, or 270 degrees were correct. 

Orientation Test 2, as administered to the Fort Lewis sample, con- 
tained 24 items. A 10-minute time limit was established to correspond to 
the increase in the number of items. Test scores on this measure are 
determined by the total number correct. 

Pilot Test Results , Table 3.7 contains the results from the Fort 
Lewis test. These data indicate that Orientation 2 is a power test (mean 
number completed - 23.7, SO « 1.04). Subjects obtained a mean score of 
11.5 (SD - 6.20)- 

Item difficulty levels (see Figure 3.14) ranged from .19 to .71 with a 
mean of .48. This represents a slight increase from the Fort Can:pbell 
tryout. Indicating the test was somewhat easier. Item-total correlations 
remained high, ranging from .22 to .74 with a mean of .53. Scores from 
Parts 1 and 2 correlate .80. Correcting this value for test length yields 
a split-half reliability estimate of .89. The Hoyt internal consistency 
value is also .89. Thus, this test has excellent reliability and distribu- 
tional properties and met its goal of being a power test. 

As noted- above, no marker tests for this test were administered in any 
of the three tryouts. Correlations with the other newly developed measures 
of spatial orientation were obtained at each tryout. Data from Fort Carson 
indicate that Orientation 2 correlates .40 with Orientation 1 (N - 29} 
and .42 with Orientation 3. Results from Fort Campbell indicate that these 
same measures correlate .45 and .54 (N - 56). Finally, the Fort Lewis data 
Indicate the measures correlate .53 and .65 (N - 118). These correlations 
were viewed as about right, that is Orientation Test 2 did correlate 
moderately with other Orientation tests but not so high as to be redundant. 

Modifications for the Fort Knox Field Test . For the Fort Knox admin- 
istration, this measure was unchanged except for the usual modification of 
the response format. 

Orientation Test 3 

Development Strategy . This test was also designed to measure spatial 
orientation. As with the other two measures of this construct. Orientation 
Test. 3 is expected to be useful in predicting success for MOS that involve 
establishing and maintaining one's bearing using features or landmarks In 
the environment. 

Orientation Test 3 was modeled after another spatial orientation test, 
Compass Directions, developed by researchers in the Army Air Force's Avia- 
tion Psychology Program. The AAF measure was designed to assess the abili- 
ty to reorient oneself to a particular ground pattern quickly and accurate- 
ly when compass directions are shifted about. Orientation 3 was designed 
to assess the same ability, using a similar test format. 
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Table 3.7 

Pilot Test Results fr om Fort Lewis; Orientation Tpst 2 
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Part 1 


Part 2 


Numbpr of Tfpm^ 
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1 line niiuwcu ^minUtcS; 


1 ft 


5 


5 


Number of Subjects 


118 


118 


113 


Number of Items Completed 








Mean 


23.73 


11.85 


11.88 


Standard Deviation 


1.04 


.71 


.45 


Range 


16-24 


6-12 


9-12 


Last Item romolpfaH hv 
of the Sample 


N/A 


12 


)2 


Percentage of Subjects 
Completing All Items 
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93% 


92% 


nuinucr OT luems Lorrect 








Mean 


11 j;^ 
1 1 . 3o 


3.0/ 


6.i6 


Standard Deviation 


0 . CV 


0.Z5 


3.28 
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3-24 


0-12 


0-12 


Total -Part Intercorrelations 








Total 


** 


.95 


.95 


Part 1 






.80 


Part 2 






** 



Split-Half Reliability (Spearman -Brown Corrected) =» .89 
Hoyt Internal Consistency « .89 
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Orientation Test 2 
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Figure 3.14. Distribution of item difficulty levels: Orientation Test 
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Itens for Orientation 3 were constructed to yield varying difficulty 
levels from moderately easy to moderately difficult. This te?t was de- 
signed to place somewhat more emphasis on speed than on power. 

Test Development. In Its original form, Orientation 3 presented 
subjects with a map that Includes various landmarks such as a barracks, a 
campsite, a forest, a lake, and so on. Within each Item, subjects are pro- 
vided with compass directions by Information on the direction of one land- 
mark with respect to another, such as "the forest is north of the camp- 
site." Subjects are also Informed of their present location relative to 
another landmark. Given this Information, the subject must determine which 
direction to go to reach yet another structure or landmark. Figure 3.15 
contains one test map and two sample Items. Note that for each Item, new 
or dlffej^nt compass directions are given. 

For the Fort Carson tryout, the test contained two maps with 10 ques- 
tions about each map, for a total of 20 Items. Subjects were given 12 
minutes to complete the test. Results from this first tryout revealed very 
few problems with the test (e.g., test Instructions were claar, the time 
was sufficient, no floor nor celling effects appeared). Thus, this measure 
remained unchanged for the Fort Campbell pilot test. 

Results from the second tryout yielded similar information (e.g., no 
ceiling nor floor effects, acceptable completion rates). These data, 
however. Indicated that for a few items, two responses might be correct due 
to a lack of precision in drawing the two maps. Accordingly, landmarks on 
each map were- repositioned to ensure that ang and only one correct answer 
existed for each item. In addition, one item was rewritten to make its 
wording uniform wUh other test items. When administered to the Fort Lewis 
sample. Orientation , contained 20 test items with a 12-m1nute time limit. 
Test scores are determined by the total number correct. 

■Pilot Test Results. Results from the Fort Lewis administration are 
reported in Table 3.8. On the average, subjects completed 18 items. The 
mean score of 8.7 indici :^s that subjects correctly answered about one-half 
of the items attempted. 

Item difficulty levels (see Figure 3.16) range from .24 to .63 with a 
mean of .44. Item-total correlations range from .48 to .72 with a mean 
of .59 (SD - .07). Part 1 and Part 2 correlate .79. The split-half reli- 
ability estimate corrected for test length is .88, while the Hoyt internal 
consistency estimate is .90. These results indicate that the test is 
highly reliable, had acceptable distributional properties, and was ap- 
propr tritely speeded. 

Data from Fort Carson indicate that Orientation Test 3 correlates .66 
with Orientation 1 (N « 29) and .42 with Orientation 2 (N - 31). Values 
for these same measures administered at Fort Campbell are .72 and .54 
(:< « 56). Data from Fort Lewis indicate that these measures correlate .68 
and .65 (N - 118). As with the other two Orientation tests, these results 
were viewed as acceptable. 

Modifications for the Fort Knox Field Tpst . This test was unchanged 
for the fjrt Knox field test except for t.';c re'jponse format modifications. 
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Figure 3.15. Sample items from Orientation lest 3. 
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Table 3.8 

Pilot Test Results from Fort Lewis: Orientation Test 3 
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18.12 


8.82 
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Standard Deviation 
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Last Item Completed by 80% 
of the Sample 


N/A 


7 
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Number of Items Correct 
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3.99 


4.72 


Standard Deviation 


. 5.78 
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Figure 3.16. Distribution of item difficulty levels: Orientation Test 3. 
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INDUCTION - FIGURAL REASONING 



This construct involves the ability to generate hypotheses about 
principles governing relationships among several objects. 

Example measures of induction include the Employee Aptitude Survey 
Test 6 - Numerical Reasoning (EAS-6), Educational Testing Service's Figure 
Classification, Differential Aptitude Test (DAT) Abstract Reasoning, 
Science Research Associates (SRA) Word Grouping, and Raven's Progressive 
Matrices. These paper-and-pencil measures present subjects with a series 
of objects such as figures, numbers, or words. To complete the task, 
subjects must first determine the rule governing the relationship among the 
objects and then apply the rule to identify the next object in the series. 

The panel of expert judges indicated that a measure of inductive 
reasoning would be useful for predicting success in numerous Army MOS. 
Specifically, for figural reasoning these judges estimated the mean validi- 
ty at .25. The Army's current selection and classification system measures 
reasoning ability using word problems, but lacks a general measure of 
hypothesis generation and application. Two measures of reasoning were 
devel oped . 

Reasoning Test 1 

Development Strategy . According to the panel of experts, a measure of 
figural reasoning should effectively predict success in a wide variety of 
MOS, especially those that involve troubleshooting, inspecting and re- 
pairing operations systems, analyzing intelligence data, controlling air 
traffic, and detecting and identifying targets. 

Published tests selected as markers for the induction construct in- 
cluded EAS-6 Numerical Reasoning and ETS Figure Classification. In the 
Numerical Reasoning Test, subjects are asked to examine a series of numbers 
to determine the pattern or the principle governing the relationship among 
the numbers in the series; subjects must then apply the principle to iden- 
tify the number appearing next in the series. In the ETS Figure Classifi- 
cation Test, subjects are asked to examine two (or three) groups of figures 
to determine how the figures in one. group are alike and how the groups 
differ; subjects must then classify test figures into one of the two (or 
three) groups. 

Our plan for developing Reasoning Test 1 was to construct a test that 
was similar to the task appearing in EAS-6 Numerical Reasoning, but with 
one major difference: items would be composed of illustrations rather than 
numberis. Test items were constructed to represent varying degrees of 
difficulty ranging from very easy to very difficult. Following item de- 
velopment, time liinits were established to allow sufficient time for sub- 
jects to complete all or nearly all items. Thus, Reasoning 1 was designed 
as a power measure of induction. 

Test Development . Reasoning Test 1 items present subjects with a 
series of four figures. The task is to identify the pattern or relation- 
ship among the figures ?nd then to identify from among five possible 
answers the one figure that appears next in the series. In the original 
test, subjects were asked to complete 30 items in 14 minutes. Sample 
items are provided in Figure 3.17. 
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Results from the first tryout, conducted at Fort Carson, Indicate that 
subjects, on the average, completed 29.5 (SD - 1.39) Items and obtained a 
nean score of 20.8 (SD - 3.54). Inspection of difficulty levels Indicated 
that Items were unevenly distributed between the two test parts. Items 
were therefore reordered to ensure that easy and difficult items were 
equally distributed throughout both test parts. Only minor modifications 
were made to test Items; for example, one particularly difficult Item was 
redrawn to reduce the difficulty level. 

Data collected at Fort Campbell Indicate that again nearly all sub- 
jects completed the test (mean - 29.7, SD - 1.50). Further, test adminis- 
trators reported that those who completed the test finished early. Thus, 
the 14-m1nute tirse limit was reduced to 12 minutes. Further, two Items 
were revised because distractors yielded higher item-total correlations 
than the correct response. 

Pilot Test Results . Data collected at the third try out, conducted at 
Fort Lewis, are reported in Table 3.9. Subjects, on the average, completed 
29.4 items with about 84 percent o? the subjects completing the entire 
test. Test scores, computed as the total number correct, ranged from 4 
to 29 with a mean of 19.6. 

Itam difficulty levels ranged from ..26 to .92 with a mean of .66. 
Item-total correlations averaged .45 (SD - .10) with a range of .24 to .60. 
Part 1 and Part 2 correlate .64. The split-half reliability estimate 
corrected for test length is .78, while the Hoyt value is .86. These 
results indicated the test was in good shape; it was a reliable power test 
with acceptable distributional properties. 

One of the marker tests, ETS Figure Classification, was administered 
at the first two tryout sites. The Fort Carson data indicate Reasoning 
Test 1 correlates .34 (N - 22) with this measure, while the Fort Campbell 
data indicate that the two correlate .25 (N - 56). Because the task 
involved in Reasoning 1 differs from that in ETS Figure Classification, the 
low value of these correlations is not alarming. 

Two other marker measures of induction, SRA Word Grouping and DAT 
Abstract Reasoning, wers administered at the Fort Lewis tryout. These data 
indicate that Reasoning 1 correlates .47 with Word Grouping and .74 with 
Abstract Reasoning. These data are compatible with our understanding of 
these two marker measures of induction. Word Grouping contains a verbal 
component while Abstract Reasoning measures induction via figural reason- 
ing, similar to Reasoning Test 1. 

Modifications for the Fort Knox Field Test . For the Fort Knox field 
test, instructions for Reasoning Test 1 were revised slightly. 

Reasoning Test 2 

Development Strategy . This measure was also designed to assess induc- 
tion using items that require figural reasoning. 
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Figure 3.17. Sample items from Reasoning Test 1. 
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Table 3.9 

Pilot Test Results from Fort Lewis: Reasoning Test 1 
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Figure 3.18. Distribution of item difficulty levels: Reasoning Test 1. 
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Published tests serving as Barkers for Reasoning Test 2 Include EAS-6 
NuMrlcal Reasoning and ETS Figure Classification; these measures were 
described for Reasoning Test 1. The original strategy was to develop 
Reasoning Test 2 fairly similarly to ETS Figure Classification. Initial 
Pr«11a1nary Battery analyses conducted on ETS Figure Classification data 
(N • 1,863) Indicated that this test was too highly speeded for the target 
population (Hough, et al., 1984). For exanple, 80 percent of recruits 
con)leting the Figure Classification test finished fewer than half of the 
112 1te«s. Further, although itea difficulty levels varied greatly, the 
Man value Indicated aost Iteos are moderately easy (mean - .73, SD - .22, 
range - .06 to .98). Thus, although the ETS Figure Classification test 
served as the marker In early test development planning for Reasoning 2, 
the new measure differed In several ways, as described below. 

First, ETS Figure Classification requires subjects to perform two 
tasks: to Identify similarities and differences among groups of figures, 
and then to classify test figures Into those groups. Items In Reasoning 
Test 2 were designed to Involve only the first task. Identifying similari- 
ties and differences among figures. Second, test items on Reasoning 2 were 
constructed to reflect a wide range of difficulty levels, with the average 
item falling in the moderately difficult range. Finally, because the items 
would be more difficult overall, we decided that Reasoning 2 would contain 
fewer items than were iticluded in the Figure Classification Test. The time 
limit for Reasoning 2 was established to ensure that most subjects would 
complete the test. Thus, Reasoning 2 was designed as a power measure of 
figural reasoning, with a broad range of Item difficulties. 

Test Development. Reasoning 2 test items present subjects with five 
figures. Subjects are asked to determine which of the four figures are 
similar in some way, thereby identifying the one figure that differs from 
the others. (See Figure 3.14.) This test, when first administered, con- 
tained 32 items with an 11-mfnute time limit. 

Results from the Fort Carson tryout indicated that nearly all subjects 
completed the entire test (mean - 31.6, SD - 1.09, N - 38). Item diffi- 
culty levels were somewhat higher than expected, ranging from .05 to 1.00 
with a mean of .71 (SD - .29). Because eight items yielded item difficulty 
levels of .97 or above, these items^ were either modified or replaced to 
Increase item difficulties. Moreover, inspection of item difficulties 
1nd1c«ited that Part 1 contained a greater proportion of the easier items, 
so items were redistributed throughout the test to obtain an equal mix of 
easy and difficult items, and to attempt to increase the relatively low, 
part-part correlation (r - .32). 

For the Fort Campbell tryout. Reasoning 1 again contained 32 items 
with an ll-minute time limit. Data from this tryout indicated that, for 
the most part, the test possessed desirable psychometric qualities. For 
example, nearly all subjects completed the test (mean - 31.1, SD - 1.91). 
Test scores ranged from 9 to 26 with a mean of 19.1 (SD - 3.56) and the 
test was a bit more difficult (mean - .56., SD - .34). Although the part- 
part correlation Increased from the first tryout, it still remained low 
(i.e. Fort Campbell r - .40 versus Fort Carson r - .32). 

A few changes were made in the test prior to the third tryout. For 
example, four items contained a distractor that was selected more often and 
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Figure 3.19. Sample items from Reasoning Test 2. 
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which yielded a higher item-total correlation than the correct response; 
these distractors were revised. Further, test administrators at Fort Camp- 
bell noted that the time limn could be reduced without altering test 
completion rates. Consequently, the time limit was reduced to 10 minutes. 

Pilot Test Results . Results from the third tryout are reported in 
Tab! 6 3.10. Seventy percent completed the entire test, but 84 percent 
completed the separately-timed first half and 79 percent completed the 
secSnd half. ThSs, these results indicate that the test is probably still 
a oower test (recall our practical rule of thumb was 80 percent completing 
all items) even with the reduced time limit. Test scores range from 11 to 
28 with a mean of 21 >8 (SD - 3.38). 

Item difficulties range from .17 to 1.00 with a mean of .64 and stan- 
dard deviation of .19. Item-total correlations averaged .26 (SD - .14) 
with a range of -.04 to .53. Parts 1 and 2 correlate .46. The split-half 
reliability estimate, corrected for test length, is .63 while the Hoyt 
value is .61. These values suggest that this is a more heterogeneous test 
of figural reasoning than is Reasoning Test 1. These data indicate that 
the test is acceptable in terms of score distribution, reliability, and 
power vs. speed continuum. 

The marker test, ETS Figure Classification, was administered at the 
first two tryouts. Correlations between ReJiSoning 2 and its marker are .35 
(N - 30 at Fort Carson) and .23 (N - 56 at Fort Campbell). These low 
correlations are not too surprising, given the task requirement differences 
and power versus speed component differences between these two measures. 
Two other marker measures of induction, SRA Word Grouping and DAT Abstract 
' Reasoning, were administered at the third tryout. These data indicate that 
Reasoning 2 correlates .48 with Word Grouping and .66 with Abstract Reason- 
ing (N - 118). Once again, these differences in correlations are expected; 
as noted earlier. Word Grouping contains a verbal component whereas Ab- 
stract Reasoning, like Reasoning 2, assesses induction using figural items. 
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Table 3.10 

Pilot Test Results from Fort Lewis: Reasoning Test 2 
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Split-Half Reliability (Shearman -Brown Corrected) » .63 
Hoyt Internal Consistency = .61 
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Figure 3.20. Distribution of item difficulty levels: Reasoning Test 2. 
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Modification s for the Fort Knox Field Test. The only change nade was 
In the response format. Reasoning Test 2 contained 32 Items with a 10- 
■Inute tine 1iiit for the Fort Knox field test. 
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OVERALL ANALYSIS OF PILOT TEST RESULTS FOR 
COGNITIVE PAPER-AND-PENCIL MEASURES 



In this section, we analyze the data available as of August 1984 for 
the ten cognitive paper-and-pencll measures. This includes a suirmary of 
pilot test score information, intercorrelations among the ten meas. es, 
results from factor analyses, and data comparing subgroup test scores. 

Before providing a summary of the cognitive test data, a word about 
the source of these data and how they will be used is warranted. As noted, 
the bulk of the data reported here was obtained from the final pilot test 
at Fort Lewis tryout. The sample size at Fort Lewis was sufficient for 
many of the analyses performed (e.g., psychometric characteristics of test 
response). 

For some analyses, however, these data serve as a first step in struc- 
turing our understanding of these measures. For example, we provide re- 
sults from a factor analysis of the intercorrelations among the ten mea- 
sures. These data provide preliminary information about the underlying 
structure of the test score data. Another example of tentative conclusions 
stems from comparisons of subgroup test scores; for the most part, the 
sample sizes of the subgroups are fairly small and, therefore, results 
should not be viewed as conclusive. 

Table 3.11 summarizes the Fort Lewis data discussed earlier in this 
chapter. For -each measure we include the number of test items, mean test 
score and standard deviation, mean item difficulty level, and split-half 
reliability corrected for test length. Note that all data are based on a 
sample size of 118 with the exception of the Path Test data which is based 
on a sample size of 116. 

Test Intercorrelations and Facto r Analysis Results 

Table 3.12 contains the intercorrelation matrix for the ten cognitive 
ability measures. One of the most obvious features of this matrix is the 
high level of correlations across all measures. The correlations across 
all test pairs range from .40 to .68. These data suggest that the test 
measures overlap in the abilities assessed. 

This finding is not altogether surprising. For example, four of the 
ten measures were designed to measure spatial abilities such as visualiza- 
tion, rotation, and scanning. The Shapes Test, designed to measure field 
independence, also includes visualization components. The three tests 
constructed to measure spatial orientation involve visualization and rota- 
tion tasks. The final two measures. Reasoning Test 1 and Reasoning Test 2, 
also require visualization at some level to identify the principle govern- 
ing relationships among figures and to determine the similarities and 
differences among figures. Thus, across all measures, abilities needed to 
complflte the required tasks overlap to some degree. This overlap is demon- 
strated in the intercorrelation matrix. 

To enable a better understanding of the similarities and differences 
among these measures or the underlying structure of these measures, the 
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Table 3.11 

CoonUive Paper-and-PencH Measures; 
Sunmarv of Fnrt. Lewis P ilot Test Results 



Measure 


No. of 

lt?ni§ 


Mean 




Mean Item- 
Difficulty 


Split- 
Half* 


SPATIAL VISUALIZATION 
























Assembling Objects 
UDject KOtttw .on 


40 


28.14 
73 36 


7.51 
15.40 


.70 
.BZ 


.7P 
.86 


Scannlna 












Path 
Hazes 


44 

24 


28.28 
19.30 


9.08 
4.35 


.64 
.80 


.82 
.78 


FIELD INDEPENDENCE 












Shapes 


CA 

54 


90 9Q 


Q Id 


54 


82 


SPATIAL ORIENTATION 












Orientation 1 
Orientation 2 
Orientation 3 


150 
24 
20 


117.86 
11.53 
8.71 


24.16 
6.20 
5.78 


.79 
.48 
.4*^ 


.92 
.89 
.88 


REASONING 












Reasoning 1 
Reasoning 2 


30 
32 


19.64 
21.82 


5.75 
3.38 


.66 
.64 


.78 
.63 



♦All readability estimates (split-halves with part 1-part 2 separately 
timed) .ave been corrected with the Spearman -Brown procedures. 
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TabU 3.12 

Inttrcorralatlcn* Aaona tha Tan Cocmltlv PaPT-and-Pa noll Maaauraa* 



Measura 




1. 


Assenbllng Objects 
















2. 


Object Rotation 


.53 














3. 


Path 


.52 


.45 












4. 


Maze 


.59 


.57 


.60 










5. 


Shapes 


cSl 


.50 


.51 


.56 


■n mm 






6. 


Orientation 1 


.62 


.52 


.54 


.^2 


.56 






7. 


Orientation 2 


.60 


.45 


.4ft 


.51 


.47 


.53 


Mas 


8. 


Orientation 3 


.62 


.50 


.40 


.47 


.60 


.68 


, .65 


9. 


Reasoning 1 


.62 


.52 


.60 


.58 


.59 


.59 


.56 


10. 


Reasoning 2 


.53 


.bO 


.48 


.52 


.51 


.54 


.53 



''All correlations are computed from a sample size of 118 except those involving the Path Test 
which are based on sample size of 116 » 
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intercorrelation matrix was factor analyzed. A principal factors extrac- 
tion was performed with Iterated, squared multiple correlations as the 
communallty estimates. Several solutions were computed, ranging from two 
to five factors. The rotated orthogonal solution for four factors appeared 
iROSt psychologically meaningful* Results from this solution appear in 
Table 3.13. 

As shown In the table, to Interpret results from the four- fact or 
solution we first identified all factor loadings of .35 or higher. Next, 
we examined the factor loading pattern for each measure and then identified 
measures with similar patterns to form test clusters. Five test clusters 
or groups, labeled A through E, are identified in Table 3.13. These 
clusters represent a first attempt to Identify the underlying structure of 
th2 cognitive measures Included in the Pilot Trial Battery. Each test 
cluster is described below: 

Group A - Assembling Objects and Shapes Tests . Recall that the Shapes 
Test requires the subject to locate or disembed simple forms from more 
complex patterns, while the Assembling Objects Test requires the subject 
visualize how an object will appear when its components are put together. 
Both measures require subjects to visualize objects or forms in new or 
different configurations. Further, these measures contain both power and 
speed components with each falling more toward the speed end of the con- 
tinuum. 

Group B T Object Rotation^ Path, and Maze Tests. Object Rotation 
Involves two-dimensional rotation of objects or forms while the Path and 
Naze tests involve visually scanning a map or diagram to Identify the best 
pathway or the one pathway that leads to an exit. These measures are all 
highly speeded; that is, subjects are required to perform the tasks at a 
fairly rapid rate. Further, the tasks Involved in each of these measures 
appear less complex or easier than those Involved in the Assembling Objects 
or Shapes tests. 

Group C - Orientation 1 and Orientation 3 Tests . Orientation Test 1 
requires one to compare compass directions provided on a test circle and a 
Given Circle, while Orientation Test 3 Involves using a map, compass direc- 
tions, and present location to determine which direction to go to reach a ' 
landmark on the map. Both measures require a subject to quickly and ac- 
curately orient oneself with respect to directions on a compass and land- 
marks in the environment despite shifts or changes in the directions. Both 
are highly speeded measures of spatial orientation. 

Group D - Orientation Test 2 . This measure involves mentally rotating 
a frame so that it corresponds to or matches up with the picture inside, 
and then visualizing how components on the frame (a circle with a dot) will 
appear after it has been rotated. This appears to be a very complex 
spatial measure that requires several abilities such as visualization, 
rotation, and orientation. In addition to the task complexity differences, 
this measure may also differ from other spatial measures on the power- speed 
continuum. Unlike the other spatial measures Included in the Pilot Trial 
Battery, Orientation 2 is a power rather than a speed test. 

Group E - Reasoning 1 and Reasoning 2 Tests . Reasoning Test 1 re- 
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Table 3.13 

Rotated Orthogonal Facto r Solution for Four Factors^ 





I 


II 




III 




IV 


h2b 


Shapes 


.47 












.568 






A 










Assembling Objects 


.47 


.48^ 










.621 


Object Rotation 


.50-^ 


.37 










.473 


Path 


.55-^ 


B 




.40 






.541 


Maze 


.76-^ 












.727 


Orientation 1 


.39 


.57^ 


C 








.617 


Orientation 3 




.79<- 








.35 


.827 


Orientation 2 




.35 








.74 ^ 1 D 


.684 


Reasoning 1 


.39 


.35 




.67-^ 


E 




.778 


Reasoning 2 


.37 


.36 




.44-^ 






.521 



Factor loadings of .35 or higher are shown. 

'h^ Proportion of total test score variance in common with other tests, 
or common variance. 
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quires one to identify the principle governing the relationship or pattern 
unng several figures, while Reasoning Test 2 involves identifying similar- 
ities among several figures to isolate the one figure that differs from the 
others. As noted above, these measures appear to involve visualization 
ibilities. The reasoning task involved in each, however, distinguishes 
these measures from the other tests included in the Pilot Trial Battery. 

Results from analyses of the Fort Lewis data provide a preliminary 
structure for the cognitive paper-and-pencil tests designed for the Pilot 
Trial Battery. Correlations among the measures indicate that all measures 
require spatial visualization abilities at some level. The measures may, 
however, be distinguished by the type of task, task complexity, and speed 
and power component differences. 

Subgroup An alyses Results 

Nean test scores were compared for two pairs of subgroups: (a) blacks 
and whites, and (b) males and females. The sample sizes for each subgroup 
are fairly small with the exception of the male subsample (H » 97). Con- 
sequently, reported differences are intended to provide only a "ball -park" 
estimate of the mean effect size differences between the subgroups. It is 
important to note that the reported subgroup differences may, in fact, be 
inaccurate estimates of the true differences in the target population. 
This may occur for several reasons, such as restriction in range of test 
score data due to selection, and primarily, sampling error because of the 
small samples used here. 

Table 3.14 contains the mean effect size differences for blacks and 
whites on the various tests. The differences for these groups range 
from .63 to 1.17. Hote that the largest differences appear in Orientation 
Test 1 (mean effect size - 1.17), Assembling Objects Test (mean effect size 
- 1.10), and the Shapes Test (mean effect size » 1.06). The smallest dif- 
ferences appear for Object Rotation Test (mean effect size - .63) and 
Reasoning Test 2 (mean effect size ' .72). These differences are in line 
with the size of white-black differences usually found with cognitive, 
paper-and-pencil tests. 

Table 3.15 contains mean effect size differences for males versus 
females on each of the ten measures. Mean effect size differences range 
from .05 to .87. The largest difference appears for the Object Rotation 
Test while the smallest difference appears for Orientation Test 2. These 
gender differences represent values somewhat lower than those usually found 
in the literature, indicating that they may be underestimates for the 
target population. 

Once again, however, we emphasize strongly that these results are 
suggestive only, due to the small sample sizes. 

Other Cognitive Tests 

In this chapter we have focused on the cognitive paper-and-pencil 
measures. Other cognitive measures were administered in the Pilot Trial 
Battery; those measures were adminis.tered via computer and are described in 
Chapter 5. Correlations among the cognitive paper-and-pencil tests and the 
cognitive computer tests are also reported in that chapter. Before de- 
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Table 3.14 



Subgroup Analyses of Cognitive Paoer-and-Pencil Tests: 
White-Black Mean Score D ifferences in Pilot Test 



Construct & Test 


No. 




Whites 






Blacks 


Mean" 
Effect 


Possible 


a 


Mean 




H 




m 


Size 


SPATIAL VISUALIZATION 
(Rotation) 


















Assembllna Obiects 








9.0U 




23.47 


8.37 


1 1 A 

1.10 








77 nn 




30 


67.97 


17.65 


.63 


SPATIAL VISUALIZATION 
(Scanning) 


















Path 


44 


65 


30.35 


8.80 


30 


22.97 


8.84 


.84 


Naze 


24 


66 


20.58 


3.88 


30 


16.57 


4.31 


1.00 


FIELD INDEPENDENCE 


















Shapes 


54 


66 


33.03 


8.31 


30 


24.50 


7.37 


L06 


SPATIAL ORIENTATION 


















Orientation 1 


150 


66 


127.65 


19.54 


30 


104.00 


21.89 


1.17 


Orientation 2 


24 


66 


13.33 


6.35 


30 


8.53 


4.98 


.81 


Orientation 3 


20 


66 


10.80 


5,43 


30 


6.20 


5.13 


.86 


REASONING 


















Reasoning 1 


30 


66 


21.53 


5.12 


30 


17.17 


5.50 


.83 


Reasonling 2 


32 


66 


22.73 


3.46 


30 


20.23 


3.56 


.72 



^Mean effect size - 



Mean (Whites) - Mean (Blacks) 
Pooled Standard Deviation 



This statistic provides an estimate of the difference in test score perfor- 
mance expressed in standard deviation units. 
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Table 3. IS 



Subgroup Analyses of Cognitive Paoer-and-Pencll Tents: 
Male-Female Mean Score Differences In Pilot Test 





No. 










Females 




Mean^ 
Effect 


Construct & Test 


Possible 


H 


Mean 




a 


Mean 




Size 


SPATIAL VISUALIZATION 
(Rotation) 


















Assemblina Oblects 


40 


97 


28.43 


7.68 


21 


26.81 


6.47 


.22 


Oblect Rotation 


90 


97 


75.63 


14.37 


21 


62.90 


15.67 


.87 


SPATIAL VISUALIZATION 
(Scanning) 


















Path 


44 


95 


28.62 


9.55 


21 


26.76 


6.29 


.21 


Naze 


24 


97 


19.80 


4.13 


21 


16.95 


4.57 


.68 


FIELD INDEPENDENCE 


















Shapes 


54 


97 


29.82 


9.07 


21 


26.76 


8.99 


.34 


SPATIAL ORIENTATION 


















Orientation 1 


150 


97 


119.01 


24.47 


21 


112.52 


21.93 


.27 


Orientation 2 


24 


97 


11.59 


6.28 


21 


11.29 


5.85 


.05 


Orientation 3 


20 


97 


8.93 


5.65 


21 


7.71 


6.27 


.21 


REASONING 


















Reasoning 1 


30 


97 


19.76 


5.63 


21 


19.05 


6.26 


.12 


Reason ling 2 


32 


97 


21.91 


3.76 


21 


21.43 


2.32 


.14 



Mean (Males) - Mean (Females) 

*Mean effect size - — 

Pooled Standard Deviation 



This statistic provides an estimate of the difference in test score perfor- 
mance expressed in standard deviation units. 
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scribing the computer-administered tests, we provide results from the field 
test analyses of the paper-and-pencil cognitive measures in Chapter 4. 
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CHAPTER 4 



COGNITIVE PAPER-AND-PENCIL MEASURES: FIELD TEST 
Harvin D. Durnette, VyVy A. Corpe, and Jody L. Toquam 



In this chapter we describe analyses of the field test of the cogni- 
tive paper-and-pencil tests in the Pilot Trial Battery, administered at 
Fort Knox in September 1984. The procedures and sample for this field test 
were described in Chapter 2. In this chapter we present descriptive sta- 
tistics for the tests, internal consistency and test-retest reliabilities, 
an analysis of gains in scores when the tests are taken a second time, and 
analyses of the relationships between the ASVAB subtests and the Pilot 
Trial Battery cognitive tests. Later chapters of this report will extend 
analysis of the data from the field tests to cover the relationships of the 
cognitive paper-and*penci1 measures with the other measures* -computer- 
administered perceptual /psychomotor, and non*cognitive paper*and-penci1- - 
which were also part of the Pilot Trial Battery. We note here that parts 
of this chapter are drawn from Toquam et a1. (1985). 

A concise description of each of the ten tests, along with a sample 
item or items from each test, is contained in Figure 4.1. Copies of the 
full Pilot Trial Battery as administered at Fort Knox are contained in 
Appendix 6. 
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ANALYSES OF DATA FROM FIELD TEST ADMINISTRATION 



Mean Scores and Reliabnitv Estimates 

Table 4.1 shows the means, standard deviations, and three estimates of 
the reliabilities of the cognitive tests administered in the field test of 
the Pilot Trial Battery. The means and standard deviations are similar to 
the results obtained at the last pilot test at Fort Lewis (see Table 3,11), 
except for two tests. The mean score for Object Rotation is about 14 
points lower for the field test (59.62 vs. 73.36), but this was expected 
and intended since we had decreased the time allowed on this test from 8 
Minutes to 7.5 minutes--in order to avoid a possible ceiling effect. Ori- 
entation Test 1 also showed a mean score decrease, from 117.86 to 88.65. 
No changes had been made in the test so it is not clear why this occurred. 
The decrease is not alarming, however, since the examinees still answered 
about .59 of the items correctly which is in the range of test difficulty 
we desired (about .50 to .70) for this set of tests. 

Difficulty levels for the other tests are also in this .50 to .70 
range, except for Orientation 3. (The test difficulties are not shown in 
Table 4.1 but can be obtained by dividing the mean score by the total 
number of items.) This test appears to be a bit more difficult than 
desired (difficulty - .39), but this appears not to adversely affect the 
test score variance (standard deviation - 5.68) or its reliability (split 
half reliability - .88 and test-retest reliability - .84). 

Three estimates of reliability 're shown in Table 4.1. The first 
one, labeled split-half, is actually computed on the Fort Lewis pilot test 
data, not on the Fort Knox field test data. Separately timed halves were 
administered at Fort Lewis, but time limitations did not allow this at Fort 
Knox. We have included these estimates because they are more appropriate 
than coefficient Alpha for those tests that are moderately or highly 
speeded. All of the PTE cognitive tests are at least moderately speeded, 
except Orientation 2, Reasoning 1, and Reasoning 2. 

Examination of these reliability estimates shows that all of the tests 
are acceptably reliable, with the possible exception of Reasoning 2. The 
estimates of internal consistency (split half and coefficient Alpha) 
are .78 or higher, except for Reasoning 2 and the test-retest reliability 
estimates (two-week interval) are .64 or higher, except for this test. 

fiain Score Analysis 

The collection of retest data allowed us the opportunity to examine 
the extent to which test score distributions might change when the tests 
are taken a second time. Generally speaking, prior exposure to a test 
leads to an increase in test scores, especially if the exposure is very 
close to the time the test is taken. In this case, the soldiers completed 
all the cognitive tests twice, with a two-week interval between administra- 
tions. 

Our concern was that taking the test a second time night lead to a 
large increase in scores. If so, this would need to be taken into account 
if the tests were used in an operational setting. (Retest opportunities 
could be controlled or limited, or parallel forms could be developed.) 
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Cognitive Paper-and-Pencll Measures 



CONSUUCT/HEASURE 



OESCXIPTION Of TEST 



SAMPLE ITEM 



SPATIAL VISUALIZATION • ROTATION 
Assfnbling Objects 



Objtct Xotttlon 



The tttt conttint 40 Item with 
• 16 alnutt tiM i\m\t. Tht 
subject's tesk Involves figuring 
out how an object wUl look when 
Its parts ere put back together 
egeln. There ert two types of 
probleott In the test. In one 
part, the Item shows a picture 
of lebelled parts, ly Mtching 
the letters. It can be "seen" 
Mhere the parts should touch 
when the object Is put together 
correctly. 'The second type of 
proble« does not lebel any of 
the parts. The parts fit together 
like the pieces of a puzzle. In 
each section/ four possible 
. , .es are provided and the 
subject Mist pick the correct one. 

The test cootelns 90 Item with 
a 7 1/2 alnute tim tinlt. The 
subject's task Involves examining 
e test object and detenelnfng 
whether the figurt represented 
In each Iteei Is the sane es the 
object, only rotated, or Is not 
the saiae as the test ob}ect 
(e.g., flipped over). For each 
test object thert are 5 test 
Item, each requiring a responst 
of "SMe" or **not sam". 



EXAMPLE 1: 




© 



® 



® 



EXAMPLE 2; 





® 







® 



0 



® 



EXAMPLE TEST OBJECTS 




^ / ^ K _ 

1.0 0 2.0 0 0 ^0 O i.0 0 



Figure 4.1. Description of Cognitive Paper-and-PencIl Measures In Field Test 
(Page 1 of 4) 
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Cognitive P^per-and-Pencll Measures 



COmXUCT/NEASW 



DES^iniON OF TEST 



SAMPLE ITEM 



ITATZAl VISUALIZATION • FIELO IIOEPENDEIICE 



Shtpes 



Tht tttt contafnt 54 ftw tilth a 16*afnutt t\m 
Unit. At tfit top of toch tost po9t art fivt 
sliplt shapts; btloi» thw shapes ara alx conplax 
f l9urtfl. SUbJ«cta art fnatructtd t<i axeaina tha 
aiipla ahapta and than to find tha ona al'ipla ahapa 
locatad In aach coaipltx figura. 




SPATIAL VISUALIZATION • SCANNINO 
Path 



Tha tnt contalna U Itaaa ulth an 8*Binuta 
tiM Halt. Subjacta ara raquirtd to deter* 
«lna tha beat path or route between two po'nta. 
Subjacta ara presented with a «p of airline 
routea or flight patha. Tha aubjectw task la 
to find tha '*beat'* peth or the path b^ * etn t>io 
polnta that raqulraa tha fewaat nucbdr «*/ atopa. 




The route froat 



1. 
2. 
3. 
4. 



A CO F 
C Co E 
C Co 0. 
0 CO F 



NtMiber of SCopai 

0 0 0 0® 
O O Q Q Q 



Hazes 



151 



Tha teat contalna 24 Itcss with a 5 1/2 slnuta 
tlM Halt. Each lta« la a rectangular mte 
with Uur labelled entrance polnta and four 
exit pomta. Tha task la to detenalna which of 
the four antrancea laada to a pathway through 
tHa Mza and to ona of tha exit polnta* 



a-» 



c-> 



© # ® ® 



j 



n. 



Figure 4.1. Description of Cognitive Paper-and-PencI I Measures in Field Test 
(Page 2 of 4) 
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Cognitive Paper-and-Pencil Measures 



CONnftUCT/NEASUtC KtOHPTION Of TftT 



INOUCTtOM 
Masoning 1 



Kttsontng 2 



Tht tttt Motafns 30 lt«M ufth 
• 12 afnuta t\m Ifaft. Subjtctt 
•ra prtttnttd kith a strftt of 
four f fgurot. Th« task It to 
fdentffy tho patum or ralatfon* 
ship aaono th« figurts and th^ 
to Idtntify froa aaong f Iva 
posslbta ansutrs tht ont flours 
that tppttrs ntxt In tho ttrltt. 

Tht ttst contains 32 Ittas ulth 
s 10 nlnuts tlat ilslt. tubjcctt 
art prtstntsd ulth f Ivo figures. 
Th«y srt thtn asked to dtttnilnt 
vhlch of tho four figurts ars 
sfsltsr In torn wy» thtrtb/ 
idtntlfying tht ont figurt that 
difftrs froa tht others. 



RGURE SERIES 



EjuxnfA9 1 



POSSIBLE ANSWERS 



0 0^00 



® 







n 




B 






■a 




n 




B 




B 



Figure 4.1. Description of Cognitive Paper-and-Pencil Measures in Field Test 
{Page 3 of 4) 
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Cognitive Paper-and-Pencil Measures 



CONnRUCr/MEASUK 



SAHPIE ITCH 



SPATIAL MIEMTATION 
Orientation 1 



a% 



Orientation 2 



Tht ttst contains 150 itMt (30 SMtoi sett) uiUi a 10 ttinutt 
tiM Uait. Each ttt prtscntt subjects with 6 circles. The 
firsts the Given Circle^ indicates the compass direction for Korth. 
For itost Ittm, North is rotated out of its conventional position 
(e.9»* the <op of the circle does not necessarily represent 
North). Coipass directions also appear on the rcaaining five 
circles. The subject's task is to deteniine for each circle^ 
whether or not the direc ion indicated is correctly positioned 
by cooparing it to the direction of North in the Given Circle. 



The test contains 24 iteas with an 10 ttinute tine Malt. Each 
Itcfli contains a picture within a circular or rectangular frane. 
^he bottom of the frame ' as a circle with a dot inside it. 
The picture or scene is not in an.upright position. The task 
is to mentally rotate the frame so that the bottom of the frame 
is positioned at the bottom of the picture. After doing to, 
t.ie subject must then decide where the dot will appear in the 
circle. 



6tm 

CIICU 



.0 0' G" "O O "'0 



HE 
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Orientation 3 The test containa 20 ftema with a 12 minute time (iait. Subjects 

are presented with a map that includes various landurks such 
as a barracks, a canpsite, a forest, a lake, and so on. 
Within each item, subjects are provided with compass directions 
by indicating the direction of one landfamrk to another, such as 
**the forest it North of the camp-site**. Subjects are also informed 
of their present location relative to another Icnctamrk. Given this 
information, the subject must determine which direction to go 
to reach yet another structure or landurk. 



T«MC 


Trt* 


A 













1 no 



1. Ttm shad l« du« nertti af th« Mc You ar* M the itor*g« Urfc. 
Which dlrtcdoA mutt you travtJ to faach th« unO 



Figure 4.1. Description of Cognitive Paper-and-Pencil Measures in Field Test 
(Page 4 of 4) 
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Table 4.1 

M..,>n.. Standa^^^ nnvtations. anH Reliabil i ty E st i mates toy the Fort . Knox Field Test of tn» Ten CogniUve 
paper-and -Pf-neil Tests 



Tine 



Reliability Coeff Ic ientsP 



Test 


Ko. 

Items 


Allotted 

(in 
minutes) 


Score 

a 

Mean^ 


SD° 


Split Half 
(N « 118) 


Coefficient 
Alpha 


Test-Retest 
rN = 97 to 126i 








Assembling Objects 


40 


16 


26.45 


8.67 


.79 




74 


Object Rotation 


90 


7.5 


59.62 


18.98 


. 86 




.75 


Path 




8 


26.37 


8.86 


.82 


.92 


.64 


Haze 


24 


5.5 


17.76 


4.45 


.78 


.89 


.71 


Shapes 


54 


16 


26.39 


10.21 


.82 


.92 


.70 


Orientation 1 


150 


10 


88.65 


34.74 


.92 


.98 


.67 


Orientation 2 


24 


10 


11.46 


5.96 


.89 


.88 


.80 


Orientation 3 


20 


12 


7.73 


5.68 


.88 


.90 


.84 


Reasoning 1 


30 


12 


19.57 


5.23 


.78 


.83 


.64 


Reasoning 2 


32 


10 


2i.r^o 


3 .63 


.63 


.65 


.57 



^ Ns range from 292 to 298 for mean and SD calculations. 

b The .split-half coefficient is computed on pilot test data from ^"^^.^^^{^^^^^ JJ^.^rrE^sed^in the 
timed halves were given, and is corrected to full test length. ^°^^f ^^^f "^^'^IPl!^-.,! „as two weeks. 
Fort Knox data and are overestimates for the speeded tests. The test-retest interval was two weexs. 
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Table 4.2 shows the gain scores for persons in the retest sample. 
Four of these tests showed gain scores that appeared to be higher than we 
thought desirable: Shapes, Orientation 1, Path, and Object Rotation. In 
order to estimate the seriousness of this concern we located gain scores 
for a number of other cognitive tests that measured similar constructs. We 
found that gain scores of similar magnitude occurred on those tests as well 
(e.g., on General Aptitude Test Battery tests of spatial aptitude and form 
perception, gain scores ranged from .46 to .62, U.S. Department of Labor, 
1970) ). Although this finding did not solve the concern with these rela- 
tively large, undesirable gain scores, it did indicate that gain scores of 
this magnitude are not uncommon for tests of this type. 

Inspection of the last two columns in Table 4.2 indicated that much of 
the gain probably occurred because the soldiers att£.Tipted more items the 
second time they took the test. This is certainly to be expected since the 
retested soldiers would be more familiar with item types and instructions. 

The gain score analysis showed that persons could, on the average, 
increase their scores on several of the PTB cognitive tests to a degree 
that seems to be cause for some concern in an operational setting. How- 
ever, a brief review of the literature showed that gain scores of the 
magnitude we found were also found for commonly used, published tests of 
the same type. This indicates that our evaluation of the need for concern 
may be unduly high. 

Covariance with ASVAB Subtestg 

One of the primary goals, and criteria for evaluation of our success, 
was the development of new predictor measures that woruld complement the 
ASVAB rather than measure the same things (see Chapter 1 for a discussion 
of the overall strategy of predictor development). In order to evaluate 
our progress toward that goal, we analyzed the covariance of the Pilot 
Trial Battery with the ASVAB. In this section we report the correlations 
between these measures and a statistic, called uniqueness, that indicates 
the amount of overlap between one test and a set of other tests. 

We take up the correlations first. If we had achieved our goal of 
complementing the ASVAB, then the PTB cognitive tests should correlate low 
to moderately with the ASVAB subtests. 

Table 4.3 contains the intercorrelations for the ASVAB subtests and 
the cognitive paper-and-pencil measures. Note that we have also Included 
scores on the Armed Forces Qualification Test (AFQT). These correlations 
are based on the Fort Knox field test sample, but include only those 
subjects with test scores available on all variables (N - 168). 

In examining these relationships, we first looked at the correlations 
between tests within the same battery. Correlations between ASVAB subtest 
scores range from .02 to .74 (absolute values). The range of intercorrela- 
tions is a bit more restricted when examining the relationships b'^tween the 
cognitive paper-and-pencil test scores (.27 to .67). This range of values 
reflects the fact that the Pilot Trial Battery measures were designed to 
tap fairly similar cognitive constructs. 
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Table 4.2 

Gains on Pilot Trial Battery Cognitive Tests for Persons Taking Tests at Both Time 1 and Time 2 



Items 





No. 


Ho. 


Time 


I 


Time 


2 




Attempted by 
75% of Subiects 


Test 


Items 


Subi ects 


ffean 


SD 


Mean 


sp 


Gain* 


Time 


I Tim i 


Assembling Objects 


40 


113 


25.68 


9.13 


28.23 


8.84 


0.28 


32 


40 


Object Rotation 


SO 


125 


61.23 


19.60 


71.34 


15.92 


0.57 


55 


69 


Path 


44 


126 


27.43 


8.43 


32.46 


7.83 


0.62 


28 


36 


Haze 


24 


97 


17.47 


4.28 


18.52 


4.34 


0.24 


17 




Shapes 


54 


121 




10 71 




11 «50 


0 6A 


30 


42 


Orientation 1 


150 


123 


91.8 


33.05 


112.49 


32.01 


0.63 


<* 


110 


Orientation 2 


24 


116 


11.64 


5.99 


12.31 


6.12 


0.11 


24 


24 


Orientation 3 


20 


117 


7.71 


5.63 


8.11 


5.60 


0.08 


16 


19 


Reasoning 1 


30 


117 


20.35 


5. .03 


21.15 


5.49 


0.15 


30 


30 


Reasoning 2 


32 


121 


21.22 


3.76 


21.88 


J. 49 


0.17 


32 


32 



Mo ** Ml 

a Gain = ^ ^ 



SD-,^ + SD, 



ICO If;, 



Table 4.3 

Intarcorrelatlons Among the ASVAB Subtests and the Cognit ive Paoer-and-Pencil Measures In 

the Pilot Trial Batterv! fpn KnOX gainple 

(N » 168) 



ASVAB 



I 

o 



oSSoew'dOog'Ho'g ^ 



AFQT Score 


















Gen Scienc 


61 
















Arith Reas 


87 


54 














Word Knov 


81 


66 


61 












Parg Conp 


69 


43 


53 


58 










Hmnb ops 


44 


02 


21 


06 


14 








Code Spd 


30 




14 


10 


14 


55 






Auto/Shop 


45 


54 


50 


45 


29 


-08 


-07 




Math Know 


76 


60 


74 


62 


54 


19 


Ifv 


43 


Mech Comp 


55 


50 


62 


54 


36 


-10 


-06 


64 


Elec Info 


56 


66 


56 


55 


39 


-o: 


01 


71 



57 

59 63 



PTB — 
Paper-and- 
Pencil Tests 



Assmbl Obj 


44 


38 


48 


40 






19 


39 


48 


57 


38 


















Obj Rotat 


30 


20 


33 


18 


13 


18 


13 


27 


29 


36 


32 


47 
















Shapes 


35 


29 


33 


28 


20 


14 


23 


17 


36 


35 


26 


51 


50 














Maze 


32 


23 


38 


16 


12 


19 


19 


34 


36 


44 


35 


60 


58 


47 












Path 


35 


20 


37 


21 


17 


4 


31 


33 


35 


42 


32 


54 


'59 


50 


63 










Reas 1 


46 


33 


48 


4X 


33 


04 


15 


29 


47 


51 


32 


66 


38 


41 


51 


52 








Reas 2 


41 


32 


44 


34 


34 


02 


12 


18 


49 


36 


30 


52 


27 


44 


33 


37 


50 






orient 1 


45 


41 


45 


35 


29 


14 


20 


40 


47 


50 


39 


52 


55 


54 


55 


58 


49 


41 




orient 2 


50 


29 


52 


38 


33 


18 


13 


36 


48 


49 


35 


51 


3S 


42 


49 


42 


48 


47 


55 


orient 3 


59 


51 


62 


48 


43 


05 


15 


55 


60 


63 


54 


61 


42 


42 


51 


46 


49 


53 


67 56 
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Exanlning the correlations between the ASVAB subtests and the PTB 
cognitive paper-and-pencil tests, we find that the correlations range from 
-.01 (Asseabling Object;^ and Nuaber Operations) to .63 (Orientation 3 and 
Mechanical Coaprehension). The mean correlation is .33 (SD - .14). Note 
that across a'li PTB paper-and-pencil tests, ASVAB Mechanical Comprehension 
appears to correlate the highest with the new tests. Across all ASVAB 
subtests. Orientation 3 yields the highest correlations. 

These results show that our goal of complementing the ASVAB has 
largely been achieved. Certainly, the ASVAB subtests and PTB tests are 
correlated, but not highly. As noted above, the mean correlation is .33 
which is moderate for the average correlation between paper-and-pencil, 
cognitive tests. This complementary nature of the PTB is shown even more 
straightforwardly by the uniqueness analyses. 

Uniqueness Estimates of Cognitive Tests 

Table 4.4 shows uniqueness estimates for the ten cognitive paper-and- 
pencil tests. Uniqueness is estimated by subtracting the squared multiple 
regression of a set of tests (in this case the ASVAB or PTBl from the 
reliability estimate for the test of interest (u^ - R„j- - r2). [see Wise 
and Mitchell (1985) for discussion of this estimate.] Uniqueness is, 
then, the amount of reliable variance for a test not shared with the tests 
against which it has been regressed. 

The hope was that the PTB tests would have high uniqueness when 
regressed against the ASVAB. Such results would indicate that the PTB 
tests complement the ASVAB when all of the ASVAB subtests are taken into 
account simultaneously, and that the necessary condition for incrementing 
the ASVAB validity (against job performance) would be present. As Table 
4.4 shows, the uniqueness estimates for the PTB when regressed against the 
ASVAB subtests ranged from .34 (Orientation 3) to .67 (Object Rotation). 
These estimates are encouraging since there Is ample room for incremental 
validity to occur. 

We point out, however, that the ASVAB tests and PTB tests were not 
administered concurrently. The ASVAB was taken prior to time of entry into 
the service and the PTB tests were administered to the soldiers about one- 
and-one-half years, on the average, after they entered the service. This 
non-concurrent administration operates to reduce the correlation between 
the two sets of tests, but to an unknown degree. Thus, these uniqueness 
estimates are overestimates by some unknown amount. 

Table 4.4 also shows the and U^ for each PTB test when regressed 
against all the other PTB tests. These U*^ values were expected to be much 
lower than the U' values obtained by regressing each PTB test against the 
ASVAB subtests, since the PTB tests measure constructs more similar to each 
other than the constructs in the ASVAB; indeed, they are about 10 to 20 
points lower, except for Orientation 3 which is only 4 points lower. 

The results of the analyses of the covariance of ASVAB with PTB show 
that there is moderate overlap between the two batteries. There appears to 
be a relatively large amount of reliable variance in the PTB cognitive tests 
that is not accounted for by the ASVAB. This is the necessary condition 
that must be obtained in order to increment the validity of ASVAB for 
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Table 4.4 

Uniqueness gstimates fo r Cognitive Tests in Pilot Trial Battery IPIB) 
Against Tests In PTB and Against Tests in ASVAB 







Other 


PTB Tests 


ASVAB Tests 


ls$.t, 


Sollt Half 


r2* 


2** 


f 




ASSGItSDiinQ UDjeCIS 


. /9 


CO 




.40 


.39 


0Dj3Ct Rotation 


• 86 


.42 




.19 


.67 


Path 


• 8Z 


.51 


.31 


.29 


.53 


Naze 


.78 


.45 


.32 


.25 


.53 


Shapes 


• oZ 




.43 


.19 


.63 


Orientation 1 


.92 


.58 


.34 


.36 


.56 


Orientation 2 


.89 


.45 


.44 


.30 


.59 


Orientation 3 


.88 


.58 


."0 


.54 




Reasoning 1 


.78 


.45 


.33 


.29 


.d3 


Reasoning 2 


.63 


.37 


.26 


.26 


.37 



*The R^ with the other cognitive paper-and-pencil tests and with the 
ASVAB subtests are the squared multiple regression coefficients cor- 
rected tor shrinkage using the standard procedure in the Statistical 
Analysis System (SAS) software package. 

♦♦Uniqueness estimates (U^) were computed using the split-half reli- 
ability estimate. The uniqueness Is equal to the split-half reliabil- 
ity minus the with the ASVAB or with other paper* and-pencil tests. 
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Summary of Analyses 

• The field test analyses showed that the PTB cognitive tests were, for 
the most part, In excellent shape. The tests ha \ adequate to excellent 
score distributions and reliabilities, with one test having ma"g1nal reli- 
ability (Reasoning 2). Four of the ten tests appeared to be susceptible to 
large Increases In test scores when they are taken a second time, but 
apparently no more so than commonly used published tests. Finally, the PTB 
cognitive tests do appear to complement the ASVAB, and possess enough 
reliable score variance that is uncorrected with ASVAB to allow the possi- 
bility of substantial incremental validity for job performance. 

As we noted in the opening of this chapter, the relationships of the 
PTB cognitive, paper-and-pencil tests to other parts of the Pilot Trial 
battery are covered in later chapters of this report. 
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CHAPTER S 



PERCEPTUAL/PSYCHOHOTOR CONPUTER-AOMINISTERED MEASURES: 

PILOT TESTIN6 

Rodney L. Rosso, Norman fi. Peterson, Jeffrey J. McHenry, 
Jody L. Toquam, Janis S. Houston, and Teresa L, Russell 



GENERA'. 



!n Chapter 1 (see Computer Battery Qevel opment) y we provided a de- 
scription of the early development of the computer- administered measures. 
We focused on site visits to military laboratories to Investigate other 
efforts to develop computer-administered tests, choice of appropriate hard- 
ware, acquisition of appropriate hardware, choice of appropriate computer 
languages, and a strategy for melding the efforts of programming the com- 
puter with the Input of staff scientists responsible for developing the 
various tests. 

In that chapter we briefly described early tryouts of the computer- 
administered measures at the Minneapolis Military Entrance Processing Sta- 
tion and at the Fort Carson pilot test. We add here that these early 
tryouts focused primarily on (1) making sure the computer programming was 
working correctly, (2) the general reactions of soldiers to a computer- 
administered battery, especially the test Instructions, and (3) the general 
effectiveness of commercially available equipment (ke>:}oards and "computer 
game" Joysticks) for acquiring examinee responses. Actual analysis of the 
test responses themselves was secondary during that phase of the research, 
however, we learned much that shaped the wjky the tests were programmed, the 
Instructions and Items that were presented, and the way responses were 
acquired. Most notably, we decided It was necessary to develop a custom- 
made response pedestal to acquire responses. 

This chapter, thrn, focuses on the tests that were developed for 
computer administration and the constructs they were designed to measure. 
We developed tests to measure three cognitive ability constructs: Reaction 
Time (or Processing Efficiency), Perceptual Speed and Accuracy, and Memory, 
as well as three psychomotor constructs: Precision/Steadiness, Multllimb 
Coordination, and Movement Judgment. All but two tasts were developed In 
time for the Fort Lewis pilot test. These two tests were Included In the 
field test at Fort Knox (they were Number Memory and Cannon Shoot, Intended 
as measures of the Memory and Movement Judgment constructs, respectively). 

We turn now to the discussion of the development of the tests and the 
results of the pilot test at Fort Lewis. (Chapter 6 presents the analysis 
of the Fort Knox field test data). 

Test Development 

our discussion of constructs, we first provide the definition and 
rationale for including each. Following this, the source or model used to 
develop each ttfst i? described, along with changes or modifications made 
prior to the Fort Lewis pilot test, if any. Results from the Fort Lewis 
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pilot test are then described in detail. For example, we describe parame- 
ters used to develop test items and results from analyses of parameter 
data. Further, test characteristics, such as time required to read in- 
structions and to complete the test, and test score information are pro- 
vided along with recommended scoring procedures. For each test, we also 
highlight correlations with other computer measures and with cognitive 
paper-and-pencil measures. Finally, modifications or test revisions made 
on the basis of Fort Lewis pilot data are described. 

We conclude this chapter by summarizing computer test results obtained 
from the Fort Lewis pilot test. 

Before describing the tests designed to measure target constructs, we 
briefly describe a critical piece of equipment designed especially for 
pilot administrations of the computerized tests in the Pilot Trial Battery. 

Development of Response Pedestal 

The microprocessor selected for use, the COMPAQ, contains a standard 
keyboard. As reported in Chapter 1 and mentioned above, in early tryouts 
of tha computer battery subjects were asked to make their responses on this 
keyboard. From these preliminary administrations, we determined that the 
keyboard may provide an unfair advantage to subjects with typing or data 
entry experience. Furthermore, use of a standard keyboard did not provide 
adequate experimental control during the testing process. Therefore, a 
separate response pedestal was designed and built. 

This response pedestal is depicted in Figure 5.1. The pedestal is 
approximately 21 inches from side to side and 10 inches from front to back. 
Note that it contains two joy sticks (one for left-handed subjects and one 
for right-handed subjects), "HORIZONTAL" and "VERTICAL" controls, a dial 
for entering demographic data such as age and social security number, two 
red buttons, three response buttons— blue, yellow, and white--and four 
green "home" buttons. (One of the "home" buttons is not visible in the 
diagram; it is located on the left side of the pedestal.) The "SELECTOR" 
control was not used by the examinee to make responses, but was necessary 
to properly connect the appropriate controls to the computer for each test. 

The "home" buttons play a key role in capturing subjects' reaction 
time scores. They control the onset of each test item or trial when 
reaction time is being measured. To begin a trial, the subject must place 
his/her hands on the four green buttons. After the stimulus appears on the 
screen and the subject has determined the correct response, he/she must 
remove his/her preferred hand from the "home" buttons and press the correct 
response button. 

The procedure involving the "home" buttons serves two purposes. 
First, control is added over the location of the subjects' hands while the 
stimulus item is presented. In this way, hand movement distance is the 
same for all subjects and variation in reaction time aue to position of 
subjects' hands is reduced to nearly zero. 

Second, procedures involving these buttons are designed to assess two 
theoretically important components of reaction time measures--decision time 
and movement time. Decision time includes the period between stimulus 
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onset and the point at which the subject removes his/her hands to make a 
response. This Interval reflects the time required to process the informa- 
tion to determine the correct response. Movement time Involves the period 
between removing one*s hands from the "home" buttons and striking a re- 
sponse key. The 'hon^' buttons on the response pedestal, then, are de- 
signed to investigate the two theoretically Independent components of reac- 
tion time. Results from an investigation of these measures appear through- 
out the following sections. 

For each test described, we provide a schematic diagram depicting the 
important components of each test. A key to these schematic diagrams is 
provided in Figure 5.2. As noted on the key, the diagram is used to 
Identify test components such as delay periods, operations such as decision 
time or movement time, and responses recorded such as correct or incorrect 
response, reaction time, or distance measures. These diagrams are designed 
to provide a more graphic picture of the activities Involved in each test. 
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- Physical operation performed by the 
subject 



- Cognitive operation performed by the 
subject 



• Computer presentation 



DP 


m 


Delay Period 


OT 




Decision Time 


MT 




Movement Time 


RT 


m 


Total Reaction Time 


ISO 




Inters timul us Delay 


R/L/B 




Response Hand recorded— right, left, or both 


c/i 




Correct or Incorrect Response recorded 


SI 


m 


First Stimulus 


S2 




Second Stimulus 


d 


m 


Distance from crosshairs to the center of the target 





Figure 5.2. Key to flow diagrams of computer-administered tests. 
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REACTION TIME (PROCESSING EFFICIENCY) 



This construct involves speed of reaction to stimuli --that is, the 
speed with which a person perceives the stimulus independent of any time 
taken by the motor response component of the classic reactior^ time mea- 
sures. According to our definition of this construct, which is an indica- 
tor of processing efficiency, it includes both simple and choice reaction 
tine (RT). 

Simple Reaction Time: Reaction Time Test 1 

Test Description . The basic paradigm for this task stems from 
Jensen's research involving the relationship between reaction time and 
mental ability (Jensen, 1982). As part of this research program, Jensen 
designed two procedural paradigms to obtain independent measures of deci- 
sion time and movement time. According to current theory, these are two 
Independent components of reaction time. Procedures for capturing these 
reaction time measures are described below. 

At the computer console, the subject is instructed to place his/her 
hands on the green "home" buttons in the ready position. When the SL.,Ject 
is in the ready position, the first item is presented. On the computer 
screen, a small box appears. After a delay period (ranging from 1.5 to 3.0 
seconds) the word vellow appears in the box. At this point, the subject 
must remove his/her preferred hand from the "home" buttons to strike the 
yellow key on the testing panel. The subject must then return his/her 
hands to the ready position to receive the next item. Figure 5.3 contains 
a schematic depiction of the simple reaction time task. 




Delay Period (DP) 




V 



Decision Time (DT) 



R/L/B -<!■ 



Release Home 



Reaction 
Time 



(RT) 



V 



Movement Time (MT) 



C/I 




Figure 5.3. Reaction Time Test I. 
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This test contains 15 Items. Although It Is self -paced, on each item 
subjects are given 10 seconds to respond before the computer time-outs^ and 
prepares to present the next Item. 

Tast Characteristics . Table 5.1 contains data on the test character- 
istics from the Fort Lewis pilot test. Variables appearing in the upper 
portion of the table provide descriptive information about test perfor- 
mance Note that, on the average, subjects read the test Instructions in 
2.5 minutes, although this ranged from about half a nninute to 5 minutes. 
Further, subjects completed the test in an average of 1.2 minutes; this 
time ranged from three-quarters of a minute to over 5 minutes. Total test 
time ranged from 1.6 to 7.1 minutes with a mean of 3.7 minutes. 

Also note that very few subjects timed-out or provided invalid re- 
sponses. The maximum number of time-outs for any subject was three, the 
maximum number of Invalid responses was one. Finally, Percent Correct 
values indicate nearly all subjects understood the task and performed it 
correctly. 

Dependent Measures ^, To identify variables of interest, we reviewed 
the literature in this area. (See Keyes, A review of the relationship 
between reaction titue and mental abilityy 1985.) Results from this review 
indicated that reaction time is often calculated for decision time, move- 
ment time, and total t1me> See Figure 5.3 for points at which these 
measures are obtained. In addition, intra- individual variation measures 
(the standard deviation of total reaction time scores) calculated for each 
subject appear to provide useful information. We began isolating dependent 
measuref; of Interest by calculating these four variables. 

When we examined reaction times for each item on this test, we dis- 
covered that these times were very high for the first few items (up to the 
fifth item). Observation of the subjects when they were taking the test 
had alerted us to this possibility. Since this was the first test adminis- 
tered, the subjects were still somewhat unfamiliar with the response pedes- 
tal and the general nature of taking cosnouter- administered tests. Accord- 
ingly, we decided to view the first five items as warm-up or practice items 
and to Include only the last ten responses in calculating mean reaction 
scores. 

Further, because subtle events (e.g., subject stretching or effec- 
tively guessing when the next item will appear) may produce extreme reac- 
tion time scores for a single item, we decided to use trimmed mean scores 
for decision, movement, and total time. These trimmed scores include 
responses to items six through 15 with the highest and lowest reaction time 
values removed. 



^ Time-outs occur if a subject fails to respond within a specified period 
of time. Invalid responses occur when a subject strikes the wrong key. In 
* both cases, the item disappears from the computer screen and, after the 
subject resumes the ready position, the next item appears on the screen. 

^ Dependent variables mean scores (e.g., Decision Time) on the tests. 
Throughout this chapter the terms "dependent variable" and "test score" 
can be viewed as Interchangeable. 
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Table 5.1 



Pilot Tggt Rg?ti l t ? FrQin Fort Le wis; R e action Time Test 1 (SimoU Rgartinn 
lifflfil (N • 112) 



DescHotlve r.hara(!tf>H<:t1rc 




m 


Range 




Time to Read Instructions (minutes) 


2.51 


.81 


.63 - 5.01 




Time to Complete Test (minutes) 


1.22 


.62 


.79 - 5.19 




Total Test Time (minutes) 


3.72 


.99 


1.59 - 7.10 




Time-Outs (number per person) 


.05 


.31 


0 - 3 




Invalid Responses (number per person) 


.07 


.26 


0 - 1 




Percent Correct 


99 


3 


80 - 100 




Denendent Measures* 


Mean 


SO 


Panqe 


Rxx^ 


Decision Time (10 Items) 


30.50 


10.15 


17.90 - 109.78 


.91 


Trimmed*^ Decision Tiwe (8 items) 


29.25 


8.10 


18.75 - 82.00 


.92 


SD - Decision 


7.85 


12.05 


.92 - 118.26 


✓ 

.77 


Movement Time (10 items) 


27.35 


8.98 


15.50 - 91.33 


.75 


Trimmed Movement Time (8 items) 


28.01 


7.26 


15.50 - 55.86 


.94 


SD - Movement 


6.68 


12.77 


.75 - 121.07 


.20 


Total Time (10 items) 


57.84 


15.78 


37.90 - 149.56 


.90 


Trimmed Total Time (10 items) 


55.92 


13.86 


37.75 - 124.71 


.94 


SD - Total 


11.79 


16.80 


1.58 - 125.85 


.66 



* All values reported are in hundredths of a second. 

Rxx - odd-even correlations, corrected to full test length using the 
Spearman-Brov«i formula. 

Trimmed scores are based on response to items 6-15, excluding the highest 
and lowest scores. 
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Hean values for all the above dependent measures were calculated. 
They appear In the lower portion of Table 5.1. Also Included In this table 
are reliability estimates for each measure (computed using an odd-even 
Method with a Spearman- Brown correction). For the most part, these values 
are quite acceptable. Reliability for trimmed mean scores appears to be 
slightly higher than for those mean scores Including all ten Items. Fur- 
ther, reliability estimates for the standard deviation measures are lowest 
for all estimates. 

To Identify dependent measures for Inclusion In subsequent analyses, 
we graphed the various reaction time scores across the 15 Items. That Is, 
nean reaction time scores were plotted ^^r decision time, movement time, 
and total time across the Items. These graphs Indicate that movement time 
and total reaction time yield very similar profiles (I.e., begin at a 
laoderately high level, drop off, and then begin to stabilize). Decision 
time, however, provides a slightly different profile. The graph for 
decision time begins at a moderately high level and drops off for the first 
half of the Items. After that, however. It becomes very unstable and no 
consistent trend shows. 

The relationship among these measures of reaction time was further 
examined by computing all palrwise correlates for each Item. Hean and 
median values of these Item-by- Item correlates appear In Table 5.2 for all 
Items (15) and for the reduced set of Items (10). These results Indicate 
that a low to moderate relationship exists between movement time and deci- 
sion time (r - .32 for 10 Items). Movement time appears to be providing 



Table 5.2 

Mean Correlations Among Decision, Movement, and Total Times; Reaction Time 

Test 1 





Mean 




Median 


Decision Tii|ie and Total Times 








15 Items 


.61 


.31 


.64 


10 Items 


.50 


.29 


.54 


Movement and Total Times 








15 items 


.80 


.15 


.87 


10 items 


.77 


.16 


.85 


Decision and Movement Times 








15 Items 


.36 


.25 


.34 


10 items 


.32 


.25 


.30 
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kinds of information similar to total time (r - .77 for 10 items). Deci- 
sion time, however, provides additional information {r - .50 for 10 
items). 

On the basis of these data, we made the following decisions: 

• Subjects' scores to be analyzed should include decision time, total 
xime, and total within-person variation score (an individual's 
standard deviation computed with the total score). 

• For reaction time measures, the trimming procedure would be usvid in 
computing decision and total mean reaction times. 

• Percent Correct scores would be computed. Although no subjects 
were being omitted because of incorrect or invalid responses, this 
could become necessary for future samples. 

• Practice effects (repeating the ssme measure several times in a 
single session) should be examined, along with test-retest effects. 
This was planned for the Fort Knox field test. 

Correlations With Other Heasiirps. Correlations of simple reaction 
times with measures derived from all computer- administered tests (which are 
described in the sections that follow) are provided in Table 5.3. 

Note that correlations among Simple Reaction Time measures (Percent 
Correct omitted from this analysis) indicate that the three correlate very 
highly with one another (Decision with Total - .85; Total SD with Total 
- .67; and Decision with Total SD - .71). Decision and Total times for 
simple reaction time correlate moderately with their Choice Reaction Time 
counterparts (range .36 to .57) which are described in the next section. 

Correlations of Simple Reaction Time measures with computer test 
dependent measures from constructs other than processing efficiency, indi- 
cate that for Decision Time the highest correlations appear with Perceptual 
Speed and Accuracy (PS & A) Intercept (.30), Grand Hean (.29), and Memory 
Intercept (.30). Total time also correlates highest with PS & A Intercept 
(.45). Total Standard Deviation correlates highest with Memory Intercept 
(.29). These correlations are about as expected since the correlated 
scores are all reaction times to intercepts based on reaction times for 
perceptual kinds of tests. (Memory involves a perceptual component even 
though it is primarily a measure of the Memory construct.) 

Correlations of the various computer- administered measures with the 
cognitive paper- and-pencil measures described in Chapters 3 ipd 4 are shown 
in Table 5.4. These correlations indicate that Decision Ti:i.e, Total Stan- 
dard Deviation, and Percent Correct are virtually unrelated to scores on 
the paper-and-pencil measures. Total reaction time, however, correlates 
highest with the Maze (-.39), Path (-.23), and Orientation 1 (-.23) Tests. 
These negative correlations indicate that "better" (faster) total reaction 
time scores are associated with better (higher) paper-and-pencil test 
scores. 

Finally, scores on these measures were correlated with video experi- 
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IntercorreUtlons of Dependent Measures Developed Fr om ^ omputer-Adn^lnj stereo 
Tests: Fort Lewis Pilot Test 



Reaction Tine (RT) 

Simple * Decision Tire 
simple - Tout Tine 
Simple * .otil SD 

Choke - n^- .»'>n Time 
Ow^ce - 'jwji ilme 
CRi ;^T (.:>tj!) 

Perceptual Speed and Accuracy (PS & A) 

Slope 
Intercept 
Irand i4ean RT 

Short*Tcrm Memory 

Slope 
Intercept 
Perce Correct 
Grand itean RT 

Tracking 

Test 1 - Hean Distance 
Test 2 - Mean Distance 

Target Shoot 

Mean Distance 
Target Identification 

Mean RT 

Percent Correct 
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Table 5.4 
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ence.'' Mean Decision Trimmed and Mean T^ts" Trimmed correlate near zero 
with this variable. Total Standard Devi at. on correlates .19 and Percent 
Correct correlates -.20 with this measure. 

Modifications for Fort Knox F^eld Test . The Reaction Time Test i 
administered in the Fort Lewis pilot test remained the same for the 
Fort Knox field test. 

Choice Reaction Time: Reaction Time Test 2 

Test Description . Reaction time for two response alternatives (choice 
reaction time, CRT) is obtained in virtually the same manner as for a 
single response (simple reaction time, SRT). The major difference is in 
stimulus presentation. Rather than the same stimulus, YELLOW, being pre- 
sented, the stimulus varies; that is, subjects may see the term BLUE or 
WHITE on the computer screen. When one of these terms appears, the subject 
1$ instructed to move his/her preferred hand from the "home" keys to strike 
the key that corresponds with the term appearing on the screen (BLUE or 
WHITE). See Figure 5.4 for a schematic depiction of the test. 

This measure contains 15 items, with seven requiring responses on the WHITE 
key and eight requiring responses on the BLUE key. Although the test is 
self-paced, the. computer is programmed to allow 9 seconds for a response 
before going on to the next item. Data for all 15 items were included in 
the analysis of the data from the Fort Lewis pilot test. The subjects had 
become familiar enough with the response pedestal that it was not thought 
necessary lo treat any items as "warm-ups." 

Test Charactar1?tics. Table 5.5 provides data describing this te^t as 
it was given in the Fort Lewis pilot test. Note that subjects were reading 
the instructions more quickly than they were for simple reaction time (1.01 
and Z.tl minutes, respectively) and ;iere also finishing the test more 
quickly (1.95 and 3.72 minuter, respectively). 

Cata o»i whether subjects used the same or different hands to respond 
to all items indicate that 23 percent of the subjects (N-26) consistently 
used the same h?nd. The remainder (77% or N-86) switched from hand to hand 
at least once to respond. 

We also examined reaction time differences in responding to the BLUE 
and WHITE keys. These results irdicate that, on the average, subjects re- 
sponded a little fa-^er to the WHITE versus the BLUE key (64.92 versus 
69.12 hundredths of a second). 

Dependent Measures. In the description of simple reaction time, we 
provided a rationale for the measures selected to score subjects' re- 
sponses. These same measures were also selected to score responses on 
choice reaction time. Mean values along with standard deviations, ranges, 
and reliabflity estimates are provided in Table 5.5. Note that for this 



Subjects were asked to rate, on a five-point scale, their degree of ex- 
perience with video game playing, prior to completing the computer tests. 
(A rating of 1 indicated no experience with video games; 5 indicated much 
experience.) 
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Figure 5.4. Reaction Time Test 2. 



5-14 



ERIC 



182 



Table 5.5 



Pilot Test Results from Fort Lewis: Reaction Time Test 2 
(Choice Reaction Time) (N - 112) 



DescHotive Characteristics 






Ranqe 




Time to Read Instructions (minutes) 


1.01 


.36 


.45 


- 2.37 




Time to Complete Test imlnutes; 


.95 


1 0 


.80 


- 1.59 




Total Test Time (minutes) 


1.95 




1.37 


- 3.20 




Tiroe-Outs (number per person) 


0 


0 


0 


- 1 




Invalid Responses (number per person) 


.17 


.10 


0 


- 1 




Oeoendent Measures 


Mean 




Ranae 


Rxx^ 


Mean Decision Time*^ 


3o.7o 


7.7o 


18.75 


- 78.29 


.94 


Mean Total Time*^ 


65.98 


10.38 


37.75 


- 117.29. 


.91 


SD • Total Time*^ 


8.92 


3.75 


1.09 


- 60.07 


.10 


Percent Correct 


99 




90 


- 100 


-.16 


Choice RT Minus Simole RT 


Mean 






Rxx^ 


Decision Time'' 


7.68 


8.79 


-43.70 


- 33.99 


.86 


Total Time'' 


10.37 


11.15 


-44.92 


- 38.71 


.79 



* Rxx * odd-even correlations corrected with the Spearman- Brown formula. 

^ Values reported are in hundredths of a second. Statistics are based on 
analysis of all ^" ^^ems of the test. 
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■easure, only the two reaction time scores provide reliable information. 

Another measure we looked at is the difference between mean choice 
reaction tine scores and simple reaction time scores--a value that is i7i- 
tended to capture a speed-t*-processing component. The typical choice re- 
action tine paradigm includes two, four, and eight response alternatives, 
and processing efficiency is computed by regrtjsing mean reaction time 
score against the number of response alternatives (i.e., one, two, four, 
and eight). The slope of this regression equation is interpreted as the 
processing speed, or the time required to process additional information. 
Because our testing pedestal does not allow for four or eight response 
choices, we cannot calcuU,ta this value. Instead, we used a score showing 
the difference between choice and simple reaction times. Note that reli- 
ability estimates suggest these values are internally consistent. 

Correlation^ With Other Measures. Correlations with measures derived 
fron ether computer-administered tests are reported in Table 5.3. These 
values indie ite that choice decision and choice total times are highly 
correlated (r • .78). {Standard deviation total and percent correct were 
omitted from these analyses due to low reliability.) Choice decision and 
choice total times correlate moderately with their simple reaction time 
counterparts. Also note that the experimental variable. Choice Total Time 
Minus Simple Total Time, correlates highly with Simple Reaction Time mea- 
sures, but only moderately with Choice Reaction Time measures. 

Choice Decision and Choice Total yield fairly similar correlation 
patterns with scores from other computer tests. These measures correlate 
highest with PS & A Intercept (r - .37 and r • .53, respectively). Target 
Identification Mean RT (r - .29 and r - .45), and Memory Intercept (r - .29 
and r • .41) and Grand Mean (r • .33 and r • .40). In addition, choice 
total yields moderate correlations with Tracking 1 Mean (r • .39), Tracking 
2 Mean (r - .33), and PS i A Grand Mean (r - .36). Again, just as for 
Simple Reaction Time, these correlations show an association between reac- 
tion times for the perceptual tasks--except for the moderate correlations 
with Tracking 1 and 2, which are somewhat unexpected, but may indicate 
association based on movement speed. 

Correlations of choice reaction time measures with cognitive paper- 
and-pencil measures appear in Table 5.^ These data indicate that choice 
decision and total time correlate highest with the Maze Test (r - -.28 and 
-.47, respectively). Total time, in fact, yields moderate correlations 
across all paper-and-pencil cognitive measures. As noted before, these 
negative correlations actually indicate that "better" scores are asso- 
ciated since lower scores on reaction time indicate better performance and 
higher scores on the paper-and-pencil tests indicate better performance. 

^ ^ Hodlfl cations for Fort Knox Field Tes^, No changes were made to this 
test for the Fort Knox field test. 
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SHORT-TERM MEMORY 



This construct Is defined as the rate at which one observes, searches, 
and recalls information contained in short-term memory* 

Metnorv Search Test 

The marker used for this test is a short-term memory search task 
Introduced by S. Sternberg (1966, 1969). In this test, the subject is 
presented with a set of one to five familiar items (e.g., letters); these 
are withdrawn and then the subject is presented with a probe item. The 
subject is to indicate, as rapidly and as accurately as possible, whether 
or not the probe was contained in the original set of items, now held in 
short-terni memory. Generally, mean reaction time is regressed against the 
number of objects in the item or stimulus set. The slope of this function 
can be interpreted as the average increase in reaction time with an in- 
crease of one object in the memory set, or the rate at which one can access 
information in short-term memory- 

Test Description . The measure developed for computer-admini stared 
testing Is very similar to that designed by Sternberg. At the computer 
console, the subject is Instructed to place his/her hands on the green home 
buttons. The first stimulus set then appears on the screen. A stimulus 
contains one, two, three, four, or five objects (letters). Following a .5- 
or 1-second display period, the stimulus set disappears and, after a delay* 
the probe item appears. Presentation of the probe item Is delayed by 
either 2.5 or 3 seconds. When the probe appears, the subject must decide 
whether or not it appeared in the stimulus set. If the item was present in 
the stimulus set» the subject removes his/her hands from the home buttons 
and strikes the white key. If the probe item was not present, the subject 
strikes the blue key. (See Figure 5.5 for schematic depiction of the 
memory search task.) Fifty items were Included on this test for the Fort 
Lewis administration. 

Parameters of Interest Include, first, stimuli'^ set length, or number 
of letters in the stimulus set. Values for this p. ameter range from one 
to five. The second parameter, observation period nd probe delay period. 
Includes two levels. The first is described as long observation and short 
probe delay; time periods are 1 second and 2.5 seconds, respectively. The 
second level, short observation and long probe delay* Includes periods 
of .5 second and 3 seconds, respectively. The final parameter, probe 
status, indicates that the probe is either jji the stimulus set or not in 
the stimulus set. These parameters will be discussed in more detail below. 

Test Characteristics . Table 5.6 provides descriptive information for 
the Memory Search Test from the pilot test at Fort Lewis. These data 
indicate that subjects, on the average, read the te^^t Instructions in 3 
minutes (range, 1.6 - 5.8) and completed the test in 9 minutes (range, 8.4 
- 11.7). Thus, total testing time for the average subject is 12 minutes 
(range, 10.4 - 17.5). Further, subjects allowed very few timeouts (mean 
« .17, SD « .80) and provided about five invalid responses (range 0-28), 
Over all, total percent correct is 90. However, the range of Percent 
Correct values, 44 to 100, indicates that at least one subject was perform- 
ing at a lower than chance level. 



1 rso 
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Figure 5.5. Memory Test. 
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Table 5.6 



Pilot Test Results From Fort Lewis; Memory Search 
(N -112} 



Test Characteristics 






kanqe 




Time to Read Inst* :tions (minutes) 


3.06 


.76 


1.64 - 5.81 




Time to Complete Test (minutes) 


9.00 


.54 


8.37 - 11.71 




Total Test Time (minutes) 


12.07 


1.06 


10.43 - 17.52 




Time-Outs (number per person) 


.17 


.80 


0 - 8 




Invalid Responses (number per 

person) 


4.86 


4.72 


0 - 28 






Mean 




Range 


Rxx' 


Slope^ 


7.19 


6.14 


-12.70 - 41.53 


# 

.54 


Intercept^ 


97.53 


30.28 


44.91 - 230.97 


.84 


Grand Mean^ 


119.05 


29.84 


67-71 - 262.35 


.88 


Percent Cor set 


89 


10 


44 - 100 


.95 



* See text for explanation of these measures. 

^ Rxx « odd-even correlation corrected with the Spearman-Brown formula. 

^ Values reported are in hundredths of s second. Statistics are based on an 
analysis of items answered correctly. (There were 50 items on the test.) 



Dependent Measures . For this test, mean values for decision time, 
movement time, and total time were computed and then plotted against item 
length, defined as the number of letters in the stimulus set. These plots 
indicated that decision and total time produce very similar profiles, 
whereas movement time results in a nearly flat profile. Since decision 
time and total time yield similar information and movement time appears to 
serve as a constant, we could have used either decision or total reaction 
time to compute scores on this measure. We elected to use total reaction 
time. 

Subjects receive scores on the following measures: 
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• Slope and Intercept. These values are obtained by regressing mean 
total reaction time (correct responses only) against item length. 
In terms of processing efficiency, Slope represents the average 
increase in reaction time with an increase of one object in the 
5timulus set. Thus, the lower the value, the faster the access. 
Intercept represents all other processes not involved in memory 
search, such as encoding the probe, determining whether or not a 
match has been found, and executing the response. 

• Percent Correct. This value is used to screen subjects completing 
the test, "^or example, recall that in Table 5.6 we indicated that 
one subject correctly answered 44 percent of the items. Computing 
the above scores (e.g.. Slope and Intercept) for this subject would 
result in meaningless information. Thus, Perceiit Correct scores 
are used to identify subjects performing at very low levels, there- 
by precluding computation of the above scores. 

• Grand Mean. This value is calculated by first computing the mean 
reaction time (correct responses only) for each level of stimulus 
set length (i.e., one to five). The mean of these means is then 
computed. 

Table 5.6 contains the mean, standard deviation, range, and reliabil- 
ity estimates for each of the dependent measures. Note that these values 
indicate that all measures except the Slope yield fairly high internal 
consistency values. 

Correlations With Ot.hP r Measures . The four dependent measures com- 
puted for the Short-Term Memory Test were correlated with scores generated 
from the other computer-administered tests of the battery and with scores 
on the cognitive jpaper-and-pencil tests (Tables 5.3 and 5.4, respectively). 
Results for these four dependent measures varied, and are discussed sepa- 
rately. 

S^iort-Term Memory Slope yielded correlations ranging from -.31 to .29 
with other computer measures. Lowest values were with Choice Reaction Time 
Total (r - -.02) and Target Tracking 2 (r - .02), while highest values were 
with Memory Intercept (r - -.31) and Grand Mean (r - .29). Dependent 
measures from other computer tests correlating moderately with Memory Slope 
include Simple Reaction Time Total SD (r - -.11) and Target Identification 
Mean Reaction Time (r - .13). When correlated with cognitive paper-and- 
pencll tests, Short-Term Memory Slope yielded generally low relationships 
The highest correlation was .13 with the Maze Test. 

Short-Term Memory Intercept correlated highest with the Memory Grand 
Mean (r - .82), Target Identification Mean Reaction Tirre (r - .4F), Percep- 
tual Speed and Accuracy Intercept (r - .44), and Choice Reaction Time Total 
(r - .41). Low relationships were found with the difference between choice 
and simple reaction times (r - .00), Perceptual Speed anci Accuracy Slope (r 
- .09), Target Shoot Mean Distance (r - .10), and Target Identification 
Percent Correct (r - .09). With the cognitive paper-and-pencil measures, 
Memory Intercept showed generally moderate relationships, for example, with 
Maze (r - -.40), Object Rotation (r - -.30), and Orientation 1 (r - -.26). 

Short-Term Memory Percent Correct correlated most strongly with Per- 
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ceptual Speed and Accuracy Intercept (r « -.43), and with other measures on 
the Memory Test (r « -.33 with Intercept, r « -.41 with Grand Mean). Weak 
correlations were found between Short Term Memory Percent Correct and 
Choice Reaction Decision Time (r « -.06) and Perceptual Speed and Accuracy 
Grand Mean (r - .01). It correlated fairly highly with Path (r - .46) and 
Moderately with several other cognitive written tests, while the lowest 
coefficients were with Object Rotation and Shapes (r « .17 for both). 

Finally, the last dependent measure of the Short-Term Memory Test was 
the Grand Mean Reaction Time (for correct responses only). This correlated 
most highly with the computer measures of Mean Reaction Time on Target 
Identification (r - .54) and the Perceptual Speed and Accuracy Intercept (r 
« .48), as well as the Short-Term Memory Intercept (r - .82). Lowest 
correlations were found with the difference between choice and simple 
reaction time (r - .02) and with the Target Identification Percent Correct 
(r « .05). Strongest relationships with the cognitive pap^^r-and-pencil 
tests were found between the Short-Term nory Grand Mean and Maze (r » 
-.33) and Orientation 1 (r « -.32). Lowest were with Orientation 2 and 3 (r 
-.11 and -.16, respectively). 

To sum up these correlations, the Grand Mean RT and Intercept for 
memory show highly similar patterns of correlations with other computer- 
administered te$ts and with cognitive paper-and-pencil tests. Both mea- 
sures are moderately correlated with Reaction Time scores and Intercept 
scores on other computer-administered tests, and have low to moderate 
correlations with paper-and-pencil test scores. The Slope score for memory 
shows low correlations with scores on almost all other measures. The 
Percent Correct score for memory shows low to moderate negative correla- 
tions with Reaction Time and Intercept scores on other computer-adminis- 
tered measures, and moderate correlations with scores on cognitive paper- 
and-pencil tests. These patterns of correlations are about as expected and 
seem to indicate that the Memory Test scores contribute some fairly unique 
variance to the PTB. 

Modifications for Fort Knox_Field Test . Results from an analysis of 
variance (ANOVA) conducted for the Fort Lewis pilot test data were used to 
modii^y this test for the Fort Knox field test. A^ noted earlier, the three 
parameters were stimulus set length, observation period/probe delay, and 
probe status. Total reaction time served as the dependent variable for 
this measure. A three-way ANUVA, 5 (stimulus set length) x 2 (observation 
period/probe delay) x 2 (probe status), was performed. 

These data indicated that the two levels of observation period and 
probe delay yielded no significant differences in reaction time (F « .27; 
p<.60). For stimulus set length, levels one to five, mean reaction time 
scores differed significantly (f « 84.35; p<.001)> This information con- 
firms results reported in the literature; that is, reaction time increases 
as stimulus set length ncreases. Finally, for probe status, in or not in, 
mean reaction time scores also differed significantly (F « 74.24; pr<. 001) • 
These values indicate that subjects, on the average, require more time to 
determine that a probe is not in the set than to determine that the probe 
is contained in the set. Results also indicated a significant interaction 
between stimulus length and probe status (F « 7.46; p<.001). 

This information was used to modify the Memory Search Test. For 
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example, stimulus set length had yielded significant mean reaction time 
score differences for the five levals; mean reaction time for levels two 
and four, however, differed little from levels three and five, respec- 
tively. Therefore, items containing stimulus sets with two and four let- 
ters were deleted from the test file. 

Although the observation period/probe delay parameter produced non- 
significant results, we concluded that different values for probe delay may 
provide additional information about processing and memory. For example, 
in literature in this area researchers suggest that subjects begin with a 
visual memory of the stimulus objects, which begins to decay after a very 
brief period, .5 second. To retain a memory of the object set, subjects 
shift to an acoustic memory; that is, subjects rehearse the sounds of the 
object set and recall its contents acoustically (Thorson, Hochhaus, & 
Stanners, 1976). Therefore, we changed the two probe delay periods to .5 
seconds and 2.5 seconds. These periods are designed to assess the two 
hypothesized types of short-term memory- -visual and acoustic. 

Finally, consideration of the probe status parameter led us to modify 
one-half of the items in the test to include unusual or unfamiliar ob- 
jects—symbols, rather than letters. In part, rationale for using letters 
or digits in a problem involves using overlearned stimuli so that novelty 
of the stimulus does not affect processing of the material. We elected, 
however, to add a measure of processing and recalling unusual material, 
primarily because Army recruits do encounter and are required to recall 
stimuli that ere novel to them, especially during their initial training. 
Conseque»itly, -one-half of the revised test items ask subjects to observe 
and recall unfamiliar symbols rather than letters. 

The test then, as modified, contained 48 Uems--one half consisting of 
letters and the other half of symbols. Within each item type, three levels 
of stimulus length are included. That is, for items with letter stimulus 
sets, there are eight items with a single letter, eight with three, and 
eight with five letters; the same is done for items containing symbols. 
Within each of tne stimulus length sets, four items include a .5-SGCond 
probe delay and four contain a 2.5-second probe delay period. Across all 
itfcins (N - 48), probe status is equally mixed between "in" and "not in" the 
stimulus set. With the test constructed, the effects of stimulus type, 
stimulus set length, probe delay period, and probe status can be examined 
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PERCEPTUAL SPEED AND ACCURACY 



The perceptual speed and accuracy (PS & A" construct involves the 
ability to perceive visual info.Tiation quickly and accurately and to per- 
form simple processing tasks with the stimulus (e.g., make comparisons). 
This requires the ability to make rapid scanning movements without being 
distracted by Irrelevant visua'i stimuli, and measures memory, working 
speed, and sometimes eye-hand coordination. 

Perceptual Speed and Accuracy Te<^t 

Measures used as markers for the development of the computer-adminis- 
tered Perceptual Speed and Accuracy Test included such tests as the Em- 
ployee Aptitude Survey (EAS-4) Visual Speed and /"ccuracy, and the ASVAB 
Coding Speed test and the Tables and Graphs test. The EAS-4 involves the 
ability to quickly and ?ccurately compare numbers and determine whether 
they are the same or different, whereas ASVAB Coding Speed measures memory, 
eye-hand coordination, and working speed. The Tables and Graphs test 
requires the ability to obtain information quickly and accurately from 
material presented in tabu tar form. 

Test Description . The computer-administered Perceptual Speed and 
Accuracy Test requires the subject to make a rapid comparison of two visual 
stimuli presented simultaneously and determine whether they are the same or 
different. Five different "types" of stimuli are presented: alpha, nu- 
meric, symbolic, mixed, and words. Within the alpha, numeric, symbolic, 
and mixed stimuli, the character length of the stimulus is varied; four 
different levels of stimulus length or "digit" are present--two-digit, 
five-digit, seven-digit, and nine-digit. Four items are included in each 
*type" X "digit" cell; for example, four items are two-digit alphas (e.g., 
XA). In its original form this test had: 



i6 two-digit items 
16 five-digit items 
16 seven-digit items 
16 nine-digit items 

word items 
80 total items 



Same and different responses were balanced in every cell except one; 
the four two-digit numeric items were accidentally constructed to require 
all "same" responses. Some example items are shown below: 

1. 96293 ^ 96298 ^ (Numeric five-digit) 

2. +/° <>^ +/° <>^ (Symbolic seven-digit) 

3. James Braun James Brown (Words) 

Reaction tines were expected to increase with the number of digits 
Included U^ the .stimulus. The rationale behind including various types of 
stlifiuli was simpiy that various types of stimuli are often encountered in 
military positions. 

The subject is instructed to hold the home keys down to begin each 
item, release the home keys upon deciding whether the stimuli are the same 
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or different, and press the white button if the stimuli are the same or the 
blue button if the stimuli are different (see Figure 5.6). 



Ready Position 




Figure 5.6. Perceptual Speed and Accuracy Test. 
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Test Characteristics . The computerized Perceptual Speed and Accuracy 
Test was administered to 112 individuals in the pilot test at Fort Lewis. 
Some of the overall test characteristics are shown in Table 5.7. 



Table 5.7 

Pilot Test Results From Fort Lewis: Overall Characteristics of Perceptual 
Speed and Accuracy Test (N - 1]2) 





Mean 




Ranq? 


Time Spent on Instructions (minutes) 


2.36 


.59 


1.37 - 


4.30 


Time Spent on Test Portion (minutes) 


7.82 


1.04 


5.82 - 


12.41 


Total Testing Time (minutes) 


10.18 


1.37 


7.45 - 


14.88 


Time-Outs (number per person) 


9.57 


6.17 


0 - 


35 


Invalid Responses (number per person) 


.94 


1.20 


0 - 


6 



The average total testing time was just over 10 minutes (range - 7.4 
to 14.9 minutes). Subjects were given 7 seconds to respond to each item. 
There were more time-outs on this test (mean - 9.6) than on the previously 
described tests. On the other hand, there were fewer invalid responses 
than on Short-Term Memory (mean - .94 for Perceptual Speed and Accuracy vs. 
4.86 for Short-Term Memory). 

Dependent Measures > The measures obtained were: response hand, per- 
cent correct, total reaction time, decision time, movement time, time for 
instructions, and total test time. The variables to be used for scoring 
purposes or dependent measures were determined through results of ANOVAs on 
total reaction times. The resulting variables include: 

The grand mean of the mean reaction times for each digit level 
for correct responses only. 

The mean total reaction time of "word" items for correct re- 
sponses only. 

The slope and intercept for the regression of mean total reaction 
time on digits for correct responses (i.e., intercept and the 
change in total reaction time per unit change in stimulus 
length). 

The grand mean of the mean reaction times for the four "non-word" 
digit levels and the "w .rd" Items. 

The percent of all items answered correctly. 
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The rationale behind the selection of these variables will be provided 
in the discussion of the ANOVA results. 

Two two-way ANOVAs were performed on reaction times for correct re- 
sponses. The first was a Type (4 levels) x Digit (4 levels) ANOVA of total 
reaction times. The results showed significant main effects for Type 
[f (3,333) - 11.99, p<.001], Digits [f (3,333) - 871.46, p<.001], and their 
interaction [F (9,999) - 44.14, p<.001] (see Figure 5.7). 

The second ANOVA conducted was on movement times. Pure movement time 
should be a constant when response hands are balanced. The results sug- 
gested that subjects were still making their decision about the stimuli 
after releasing the home keys (see Figure 5.8). That is, the movement time 
ANOVA for Type X Digits yielded a significant main effect for Digits 
[F (3,333) - 19.94, p<.001]. The interaction of Digits and Type was also 
significant [F (9,999) - 7.22, p<.001]. 

The implications of these results are: 

Scores should be formed on total reaction times (for correct re- 
sponses) instead of decision times because subjects appear to 
continue making a decision after releasing the home keys. Thus, 
use of decision time would not include time that subjects were 
using to process items. 

Means should be computed separately for each set of items with a 
particular digit level (i.e., two, five, seven, and nine). Num- 
ber of digits had a greater effect on mean reaction time than did 
type of stimuli. Since only correct response reaction times are 
being used, subjects could raise their scores on a pooled reac- 
tion time by simply not responding to the nine-digit items. 
Thus, the mean reaction times to correct responses for each digit 
level should be equally weighted. The grand mean of the mean 
reaction times for each digit level was computed. 

The nine-digit symbolic items were probably too easy. Mean 
reaction times for the nine-digit symbolic items ware substan- 
tially less than those for the other nine-digit items. Further 
inspection of the items showed that some were pro'jably being 
processed in "chunks" because symbols were grouped (e.g., 
«++++*//)• 

Total reaction times for correct responses could be regressed on 
digit. Intercepts and slopes could be computed for individuals 
by means of a repeated measures regression (i.e., the trend 
appeared to be linear). 

As a whole, the scores on the computerized Perceptual Speed and Ac- 
curacy Test were quite reliable (see Table 5.8). Reliability coefficients 
ranged from .85 for the Intercept of the regression of total reaction time 
on digits to .97 for the Grand Mean of the mean reaction times for the four 
non-words categories and for all categories. 
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Figure 5.7. Type x Digit analysis of variance on Total Reaction Time. 
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Figure 5.8. Type x Digit analysis of variance on Movement Time. 
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Table 5.8 

Pilot Test Results From Fort Lewis: Deoandent Measure Scores From 
Perceptual Soeed and Accuracy Test (N - 112) 



b 



Score ^ Mean 5fi Range Rxx ' 

Grand Mean of Mean Reaction 279.99 57.97 85.67 - 386.49 .97 
Times for Non-word Items 

Mean Reaction Time for 351.74 68.39 198.64 - 518.64 .91 
Word Items 

Grand Mean of Mean Reaction 294.22 57.13 109.34 -412.75 .97 
Tiroes for Uord and Non- 
word Items 

Intercept 89.37 36.48 12.99 - 210.34 .85 

Slope 33.14 9.78 -.75 - 52.11 .89 

Percent Correct 86.90 8.00 56.3 - 100 



Reaction Time values are in hundredths of a second and are based on 
analysis of items answered correctly. (There were 80 items on the 
test.) 

Split-half (odd-even) reliability estimates, Spearman -Brown cor- 
rected. 



Interrelationships Among Perc eptual Speed and Accuracv Scores . 
Ideally, efficient performance on the Perceptual Speed and Accuracy Test 
would produce: a low intercept, a low slope, and high accuracy, combined 
with a fast grand mean reaction time score. Data analyzed from the Fort 
Lewis testing suggests that this relationship may occur infrequently. As 
shown in Table 5.9 , the relationship of Slope with Intercept is negative; 
that is, low Intercepts tend to correspond with steep Slopes. However, it 
is possible that individuals who obtained low Intercepts simply had more 
"room" to increase their reaction times within the 7-second time limit, 
thus increasing their Slope scores. Since high Intercept values were 
related to slower Grand Mean Reaction Times, as well as less accurate per- 
formance, and more "time-outs" occurred on the nine-digit items, it is 
likely that the 7-second time limit produced a ceiling effect. 

The high positive correlation between the slope and accuracy suggests 
that performing accurately is related to increasing reaction time substan- 
tially as the stimuli increase In length. Steeper slopes also correspond 
with slower grand mea^n reaction times. These slower reaction times were 
also related to higher accuracy. 
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Table 5.9 

Intercorrelatlons Among Perceptual Speed and Accuracy Test Scores 



% Correct 



Slope 



Percent Correct 




.64^ 



Grand Mean^ 



.35^ 



.79^ 



.45^ 



^< .001 
p < .003 

^ Grand mean reaction time in s section refers to: 

^2-digits + ^5-digits + ^7-digits + ^9-digits + ^words 
Grand Mean » 



Co rrelations With Other Measures , The Perceptual Speed and Accuracy 
Test score that relates most highly with scores from the other computer- 
administered tests is the Intercept (see Table 5,3), Scores correlating 
most highly with the Intercept are the Choice Reaction Time Total and the 
Short-Term Memory Grand Mean Reaction Time. 

The PS & A Grand Mean Reaction Time also correlates highly with scores 
from several of the computerized tests. Among the highest of these corre- 
lations are those with Target Identification Mean Reaction Time and the 
Short-Term Memory Grand Mean Reaction Time. The PS & A Slope correlated 
with accuracy on the Short-Term Memory Test but was not highly correlated 
with most of the other computer-administered measures. 

The Perceptual Speed and Accuracy Intercept value correlates rela- 
tively highly with all of the cognitive paper-and-pencil measures (see 
Table 5.4). Its highest correlations were with Maze, which is a spatial 
scanning test (r • -.57), Orientation Test 1 (r » -.5,) and Reasoning 
Test 1 (r - -.48). 

The Slope was most highly correlated with Reasoning Test 2 (r » .27). 
Accuracy on the PS & A test was most highly correlated with Reasoning 
Test 2 and Orientation Test 1 (r « .31), and Assembling Objects (r » .30). 
Object Rotation (r » -.35) and Maze (r » -.33) produced moderate correla- 
tions with the HS & A Grand Mean Reaction Time. 

Generally speaking, the pattern of correlations for the Perceptual 
Speed and Accuracy scores is similar to that seen for the Memory Search 
Test. The PS & A Intercept and Grand Mean RT scores show patterns fairly 
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siiBiUr to those for the same scores on the Memory Test, but PS & A Inter- 
cept shows a much stronger relationship with the cognitive, paper-and- 
pencil test scores than does the memory Intercept. Also, PS & A Slope 
generally shows lower correlations with all other measures as does the 
memory Slope. 

Modifications for Fort Knox Field Te^t . Several changes were mauc ^o 
this test following the Fort Lewis pilot test. A reduction in the number 
of items was considered desirable in order to cut down the testing time, 
and the reliability of the test scores (see Table 5.8) indicated that the 
test length could be considerably reduced without causing the reliabilities 
to fall below acceptable levels. Item deletion was accomplished in two 
ways. First, all the seven-digit items were deleted (16 items). Examina- 
tion of Figure 5.7 shows that such deletions should have little effect on 
the test scores, since the relationship between number of digits and reac- 
tion time is linear, and the items containing two, five, end nine digits 
should provide sufficient data points. Second, 16 more items were deleted 
by deleting four items from each of the remaining three digit categories 
(two, five, and nine) and from the "word" items. The following factors 
were considered in selecting items for deletion: 

• Item intercorrelations within stimulus type and digit size 
were examined. In many cases, one item did not correlate 
highly with the others. Items that produced the lowest inter- 
correlations were deleted. Use of this criterion resulted in 
13 item deletions. 

• When item intercorrelations did not differ substantially, 
accuracy rates and variances were reviewed but did not indi- 
cate any clear candidates for deletion. 

• When all the above were approximately equal, the decision to 
retain an item was based on its correct response (i.e., "same" 
or "different"). If retaining the item would have caused an 
imbalance between the responses, it was deleted. This was, in 
effect, a random selection. 

Deletion of the 32 items left a 48 item test. 

Several other changes were m^de, either to correct perceived short- 
comings or to otherwise improve the test. The symbolic nine-digit items 
were modified to make them more difficult. As previously noted, these 
items had originally been developed in such a way that the symbols were in 
"chunks," thus making the items, in effect, much shorter than the intended 
nine digits; these groups were broken up. Five items were changed so that 
the correct response was "different" rather than "same" in order to balance 
type of correct response within digit level. Finally, the time allowed to 
make a response to an item was increased from 7 seconds to 9 seconds in 
order to give subjects sufficient time to respond, especially for the more 
difficult items. 

The revised test, then, contained 48 items; 36 were divided into 
12 Type (alpha, numeric, symbolic, mixed) by Numbe- of Digits (two, five, 
nine) cells, and 12 were word Items. 
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Ue also changed the presentation of the Items so that they disappaared 
from the display screen as soon as the subject released the ''home" button. 
This was Intended to correct the problem of confounding decision dme with 
movement time that was discussed above. 

Target Identification Test 

T dst Description . The Target Identification Test Is a measure of 
perceptual speed and ?-curacy. The objects perceived are meaningful fig- 
ures, however, rather than a series of numbers, letters, or symbols as In 
the the preceding test. 

In this test, each Item shows a target object near the top of the 
screen and three labeled stimuli In a row near the bottom of the screen. 
Examples are shown In Figure 5.9. The subject Is to Identify which of the 
three stimuli represants the same object as the target and to press as 
quickly as possible the button (blue, yellow, or white) that corresponds to 
that object. A flow chart indicating the series of events In this test Is 
presented In Figure 5.10. 

Five parameters were varied In depicting objects for the test. The 
first was type of abject. The objects shown on the screen are based on 
military vehicles and aircraft as shown on the standard set of flashcards 
used to train soldiers to recognize equipment presently being used by 
various nations. Me sorted these. cards into four basic type?,: tanks and 
other tracked vehicles, fixed-wing aircraft, helicopters, and "wheel ad" 
vehicles. Then we prepared computerized drawings of representative objects 
in each type. These drawings were vol Intended to be completely accurate 
renditions but rather to depict the figures in a less complex drawing while 
retaining the basic distinguishing features. Twenty-two drawings of ob- 
jects were prepared. 

The second parameter was the position of the correct response— that 
is, on the left, middle, or right side of the screen. The third parameter 
was. the orientation of the target object— whether it is "facing" in the 
same direction as the stimuli (the objects to be matched with the target) 
or in the opposite direction. This reduces to the target object "facing" 
left (one's left as one looks at the screen) or "facing" right. 

The fourth parameter was the angle of rotation (from horizontal) of 
the target object. Seven different angular rotations were used for the 
Fort Lewis administration of this test: 0°, 20°, 25°, 30^, 35°, 40°, and 
45°. Example 1 in Figure 5.9 shows a rotated target object and Example 2 
shows an unrotated object (0°). 

The fifth parcmeter was the size of the target object. Ten different 

levels of size reduction were used in the Fort Lewis administration: 40%, 

50%, 55%, 60%, 65%, 75%, 80%, 8?%, 90%, and 100%. Forty percent reduction 

means that the target object was 40 percent of the size of the stimulus 

objects at the bottom of the screen. 

We had no intention of creating a test that had Items tapping each 
cell of a crossed design for these five parameters. Instead, we viewed 
this tryout of the test as an opportunity to explore a number of different 
factors that could conceivably affect test performance. A total of 44 
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EXAMPLE 1. 




lilRGEI 





EXAMPLE 2. 

IftRGEI 




Figure 5.9. Graphic displays of example items from the computer- 
administered Target Identification Test. 
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Figure 5.10. Target Identification Test. 
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Items were Included on the test. 



Test Characteristics. Table 5.10 shows data from the Fort Lewis pilot 
test of the Target Identification Test. With reference to the first part 
of the table, we see that the average time to read the Instructions was 
about 2 Minutes, with a range of 1.1 to 9.2 minutes. The time required to 
take the actual test averaged 3.6 minutes, and ranged from 3 to 5.5. 
Hence, the total tert time (instruction plus actual test) ranged from 4.1 
to 12.8 minutes and averaged 5.6. 

The subject has 9.99 seconds to make a response on this test. Very 
few time-outs occurred, much less than one per person on the average, and 
with a maximum of two. The number of invalid responses was fairly high for 
this test, 3.2 on the average. 

Dependent Variables . The primary dependent variables or scores for 
this test were Total Reaction Time (Includes both decision and movement 
times) for correct responses, and the percent of responses that were cor- 
rect. Total Reaction Time was used rather than decision time because it 
seems to be more ecologically valid (i.e., the Army is Interested in how 
quickly a soldier can perceive, decide, and take some action and not just 
in the decision time). Also, various analyses of variance, discussed 
below, showed similar results for the two measures. 

The second part of Table 5.10 shows data from the two dependent mea- 
sures of concern: Total Reaction Time and Percent Correct. The test was 
conceived as a speeded test, in the sense that each item could be answered 
correctly if the subject took sufficient time to study the items and, 
therefore, the reaction time measure was Intended to show the most vari- 
ance. The data show that these intentions were achieved, since the mean 
Percent Correct was 92.6 with a standard deviation of 8.3, while the Reac- 
tion Time mean was 218 hundredths of a second with a standard deviation of 
68.8. The reliability estimates show that the Reaction Time measure was 
hichly reliable (.97), and it was about 20 points higher than the relia- 
bikity for Percent Correct. 

We performed a number of analyses of variance in order to investigate 
the effects of the five parameters described above on the most Important 
dependent variable. Mean Tctal Reaction Time. Because of the number of 
parameters and levels withfn each paraiteter, a completely crossed design 
was not feasible. Instead, we carried out several one-way and two-way 
ANOVAs. Basically, the analyses showed that all the parameters had signif- 
icant effects (wjBll beyond the .01 level) on the mean reaction time score, 
but that many parameters Included too many levels in the sense that there 
was little difference between scores for adjacent levels of a parameter. 
The results of these analyses were used to guide the revision of this test, 
described below. 

Correlations With Othe r Maasures . Correlations between Mean Reaction 
Time and Percent Correct on the Target Identification Test and scores on 
other Pilot Trial Battery tests were computed. Correlations of Mean Reac- 
tion Time with other computer tests ranged from .06 to .58 (see Table 5.3), 
The strongest relationships were with Perceptual Speed and Accuracy and 
Short-Term Memory, while the weakest were with several Simple and Choice 
Reaction Time measures. Percent correct correlated most highly with Short- 
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Table 5.10 

Pilot Test Results from Fort Lewis: Target Identification Test (N - 112) 



Descrlotive Characteristics 








Ranae 




1 iiHc t!j RcdQ instructions ^niinutcs/ 




1 HA 


1 

1 < 


.10 


- 9.21 




Tine to Complete Test (minutes) 


3.61 


0.45 


2, 


.96 


- 5.46 




Total Test Time (minutes) 


5.62 


1.23 




.12 


- 12.81 




1 llilc''UUtd ^pci llcisiun y 




. CO 




0 


- 2 




Invalid Responses (per person) 


3.20 


3.62 




0 


- 29 




Deoendent Measures 


Mean 






Ranae 




Total Reaction Time** 


218.51 


68.75 


113 


.10 


- 492.95 


.97 


Percent Correct 


92.60 


8.30 


34 


.1 


- 100 


.78 



Reliability estimates computed using odd-even procedure with Spearman- 
Brown correction. 

In hundredths of a second. 



Term Memory (r - .51 with Percent Correct) and Perceptual Speed and Accu- 
racy Slope (r - ,27). The lowest relationships were with the reaction time 
measures and two measures on the Short-Term Memory Test (r - .07 with Slope 
and .05 with Grand Mean). 

For Mean Reaction Time, correlations ranged from -.30 to -.50 with 
paper-and-pencil tests (see Table 5.4). The strongest relationships were 
with the Maze Tr t and Orientation Test 1; the weakest were with Assembling 
Objects and Path. 

Percent Correct correlations with paper-and-pencil tests ranged 
from .11 to .29, the lowest being with Orientation Tests 1 and 3, and the 
highest with Assembing Objects and Maze. 

Modifications for Fnrt Knox Field Test . Two parameters of the test 
were left unchanged: position of the correct response or object that 
"matched" the target (left, middle, or right position) and direction in 
which the target object faced (in the same or opposite direction of the 
objects to be compared). Analyses of the Fort Lewis data indicated that 
opposite-facing targets appeared to be a bit more difficult (i.e., had 
higher mean reaction times), and data on object position showed that those 
In the middle were slightly "easier" (faster reaction time). We thought it 
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best, however, to balance the items with respect to these two parameters in 
order to control the response style. 



The other three parameters were changed. The objects to be matched 
with the target were made to be all from one type (helicopters or aircraft 
or tanks, etc.) or from two types, rather than from one, two, or three. 
This was done because analyses showed the "three-type" items to be ex- 
tremely easy. Rotation angles were reduced from seven levels to just two, 
0 and 45", since analyses showed that angular rotations near 0° had very 
llttla effect on reaction time. 

Finally, the size parameter was radically changed. The target object 
|ias either 50 percent of the stimulus objects, or was made to "move." The 
moving" items were made to initially appear on the screen as a small dot, 
indistiguishable, and to then quickly and successively disappear and reap- 
pear, slightly enlarged in size and slioWy to the left or right (de- 
pending on the side of the screen on which the* target initially appeared) 
of the prior appearance. Thus, the subject had to observe the moving and 
growing target until certain of matching it to one of the stimulus objects. 
These moving" items were thought to represent greater ecological or con- 
tent validity, but still to be a part of the perception construct. 

The revised test consisted of 48 items, distributed one each in the 48 
cells depicted in Figure 5.11. 
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Plgura 5.11. Distribution of 48 items on the revised Target Identification Test 
according to five parameters. 
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PSYCHOMOTOR PRECISION 



This construct is the ability to snake muscular jnovssnents necessary to 
adjust or position a machine control mechanism. This ability applies to 
both anticipatory movements (i.e., where the subject must respond to a 
stimulus condition which is continuously changing in an unpredictable 
manner) and controlled movements (i.e., where the subject must respond to a 
stimulus condition which is changing in a predictable fashion, or making 
only a relatively few discrete, unpredictable changes). Psychomotor preci- 
sion thus encompasses two of the ability constructs identified by Fleishman 
and his associates, control precision and rate control (Fleishman, 1967). 

Performance on tracking tasks is very likely related to psychomotor 
precision. Since tracking tasks are an important part of many Anqy MOS, 
development of psychomotor precision tests was given a high priority. The 
Fort Lewis computer-administered battery included two measures for pilot 
testing this ability. 

Taroet Tracking Test 1 

The Target Tracking Test 1 was designed to measure subjects' ability 
to make fine, highly controlled movements to adjust a machine control mech- 
anism in response to a stimulus whose speed and direction of movement are 
perfectly predictable. Fleishman labeled this ability control precision. 

During World War II, Army Air Force researchers working in the Avia- 
tion Psychology Program used several control precision tests in an attempt 
to predict performance for several aircrew jobs (Melton, 1947). The test 
which proved to be the most valid predictor was the Rotary Pursuit Test. 
In this test the subject is presented with a round metal target which 
revolves near the edge of a phonograph-like disk. The subject is given a 
metal stylus and told to maintain contact with the target as it rotates. 
The Rotary Pursuit Test served as a model for Target Tracking Test 1. 

Test Description. For each trial of this pursuit tracking test, 
subjects are shown a path consisting entirely of vertical and horizontal 
line segments. At the beginning of the path is a target box, and centered 
in t^e box is a crosshair. As the trial begins, the target <tarts to move 
along the path at a constant rate of speed. The subject's task is to keep 
the crosshairs centered within the target at all times. The subject uses a 
joy stick, controlled with one hand, to control movement of the crosshairs. 
Figure 5.12 presencs a schematic representation of this task. 

Several item parameters were varied from trial to trial. These in- 
clude the speed of the crosshairs, the maximum speed of the target, the 
difference between crosshairs and target speeds, the total length of the 
path, the number of line segments comprising the path, and the average 
amount of time the target spends traveling along each segment. Obviously, 
these parameters are not all independent; for example, crosshairs speed and 
maximum target speed determine the difference between crosshairs and target 
speeds. 
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Figure 5.12. Target Tracking Test 1 



For the Fort Lewis battery, subjects were given 18 test trials. Three 
of the 18 paths were duplicates (the paths for trials 15-17 were Identical 
to the paths for- trials 1, 2, and 7). Except for these duplicates, the 
test was constructed so that the trials at the beginning of the test were 
easier than trials at the end of the test. In other words, target and 
crosshairs speeds were slower during the first several trials than during 
the final trials, the paths were shorter, the paths Included fewer line 
segments, and so forth. 

Dependent Measures . Two classes of dependent measures were Investi- 
gated for this test: (1) tracking accuracy, and (2) Improvement In track- 
ing performance, based on the three duplicate paths Included In the test. 

Two tracking accuracy measures were Investigated, time on target and 
distance from the center of the crosshairs to the center of the target. 
Kelley (1969) demonstrated that distance Is a more reliable measure of 
tracking performance than time on target. Therefore, the test program 
computes the distance^ from the crosshairs to the center of the target 



The COMPAQ video screen Is divided into 200 pixels vertically and 640 
pixels horizontally, with each vertical pixel equivalent to three hori- 
zontal pixels. All distance measures were computed in horizontal pixel 
units. 
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several tines each second, and then averages these distances to derive an 
overall accuracy score for that trial. Subsequently, when the distribution 
of subjects* scores on each trial was examined, It was found that the 
distribution was highly positively skewed. Consequently, the trial score 
was transformed by taking the square root of the average distance. As a 
result, the distribution of subjects' scores on each trial was more nearly 
normal. These trial scores were then averaged to determine an overall 
tracking accuracy score for each subject. 

Prior to the Fort Lewis pilot test, it was expected that subjects* 
tracking proficiency would Improve considerably over the course of the 
test. That was one of the reasons that initial test trials were designed 
to be easier than the later test trials. However, analyses of the Fort 
Lewis data revealed that subjects* performance on trials 1, 2, and 7 actu- 
ally differed little from their performance on the duplicate trials 15-17. 
Therefore, it was decided that no further measure of Improvement in 
tracking performance would be computed. 

Test Characteristics. Table 5.11 presents datf* for Target Tracking 
Test 1 based on the Fort Lewis pilot test. The 18 trials of the test 
required 9 minutes to complete. Since all subjects received the same set 
of paths, there was virtually no variability. Instruction time mean was 
1.2 minutes. The range of total test time was from 9.4 *o 12.2 minutes, 
with a mean of 10.3 minutes. 

Mean and standard deviation for overall accuracy score were 1.44 and .45, 
respectively. As a result of the square root transformation, the distribu- 
tion of accuracy scores was only slightly positively skewed. The Internal 
consistency reliability of the accuracy score, computed by comparing the 
mean accuracy scores for odd and even trials, was .97. 



Table 5.11 

Pilot Test Results From Fort Lewis; Target Tracking Tpst 1 (N - 112) 



Descriotive Characteristics 


Mean 




Panqe 


Time to Read Instructions (minutes) 


1.20 


.43 


.33 - 3.09 


Time to Complete Test (minutes) 


9.07 


.02 


9.05 - 9.12 


Total Test Time (minutes) 


10.27 


.43 


9.42 - 12.17 


Deoendent Measure 


Mean 


SB 


Ranqe 


Distance'^ 


1.44 


.45 


.95 - 3.40 



ERIC 



* Spearman -Brown corrected split-half reliability for odd-even trials. 

^ Square root of the average within-trial distance (horizontal 
pixels) from the center of the target to the center of the cross- 
hairs, averaged across all 18 trials (or items) on the test. 
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Four one-way analyses of variance were executed to determine the ef- 
fects on tracking accuracy of average segment length, average time required 
for the target to travel a segment, maximum crosshairs speed, and differ- 
ence between saxisua crosshairs speed and target speed. All four i*m 
parameters were significantly related to accuracy score, with crosshairs 
speed accounting for the most variance, and difference between target and 
crosshairs speed accounting for the least variance. It should be noted 
that all four parameters were highly Intercorrelated (the six Intercorrela- 
tlons ranged from .37 to .87, with a median Intercorrelatlon of .52), and 
all four were also correlated with trial number (I.e., trials were designed 
to become more difficult as the test progressed). As a result. It Is 
difficult to Interpret the results of these ANOVAs. 

Correlation s With Other Measures . Table 5.3 shows the correlations 
between the Target Tracking Test 1 and other computer-administered mea- 
sures. The test was highly correlated with Target Tracking Test 2 (r 
« .76). Because that test was Intended to be a measure of a different con- 
struct, multllimb coordination, this correlation is troubling. In part. It 
reflects the great similarity of these two tests; both used the same set of 
18 tracking paths, presented in the same order. The only difference was In 
the type of control adjustments required; for Target Tracking Test 1 sub- 
jects used a joy stick operated with their preferred hand to make all 
control adjustments, and for Target Tracking Test 2 subjects used both 
hands to manipulate horizontal and vertical sliding resistors. It Is 
probable that the large correlation Is due mainly to the high degree of 
task similarity. 

Target Tracking Test 1 was also significantly correlated with tracking 
performance on the other psychomotor test, the Target Shoot Test (r » 32 
for Distance from the center of the crosshairs to the center of the target 
at the time of firing, r - .43 with percent of hits). The significant 
intercorrelatlons among the psychomotor tests reflect a general psychomotor 
ability factor. (This factor also emerged in a factor analysis of the 
computer tests, discussed below.) 

Correlations with Target Tracking Test 1 also exceeded .30 for four 
other computer-dependent measures— Perceptual Speed and Accuracy Intercept 
(r - .36), Target Identification Mean Reaction Time (r - .46), and Total 
Reaction Time for the Simple and Choice reaction time tests (r - .31 
and .39, respectively). These measures all reflect the speed of rather 
basic cognitive processes (e.g., detection, comparison). 

Target Tracking Test 1 also correlated significantly with all the 
cognitive paper-and-pencil tests in the pilot trial battery (Table 5.4). 
These correlations ranged from .27 with the Assembling Objects Test to .52 
with the Haze Test. As noted previously, most of these paper-and-pencil 
tests were designed to measure some aspect of spatial ability. In the 
literature review for the psychomotor ability domain, it was shown that 
control precision correlated more highly with spatial ability than with any 
other .cognitive ability. Thus, the significant correlations between Target 
Tracking Test 1 and the paper-and-pencil tests do not represent a surprise. 

Modifications for Fo rt Knox Field Test . Several changes were made in 
the paths comprising this test for the Fort Knox field test. First, all 
paths were modified so that each would run for the same amount of tine 
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(approximately .36 minute). The primary reason for this change was that 
the program computes distance between the crosshairs and target a set 
number of tlnies each second. If a11 paths run the same amount of time, 
then the accuracy measure for all trials will be based on the same number 
of distance assessments. 

Second, three item parameters were identified to direct the format of 
test trials: maximum crosshairs speed, difference between maximum cross- 
hairs speed and target speed, and number of path segments. Given these 
parameters and the constraint that a11 trials run a fixed amount of time, 
the values of a11 other item parameters (e.g., target speed, total length 
of the path) can be detennined. Three levels were identified for each of 
tne three parameters. These were completely crossed to create a 27* item 
test, and items were then randomly ordered. These procedures for item 
development should alleviate pilot testing problems in interpreting test 
results in light of correlated item parameters. 

Third, in spite of these changes, which added 50 percent more trials 
to the test, testing time was actually reduced slightly (25 seconds less, 
it was estimated) because of the standardization of the trial time. 

Target Shoot Test 

The Target Shoot Test was modeled after several compensatory and 
pursuit tracking tests used by the AAF in the Aviation Psychology Program 
(e.g., the Rate Control Test). The distinguishing feature of these tests 
is that the target stimulus moves in a continuously changing and unpredict- 
able speed and direction. Thus, the subject must attempt to anticipate 
these changes and respond accordingly. 

Test DescriDtion , For the Target Shoot Test, a target box and cross- 
hairs appear in different locations on the computer screen. The target 
moves about the screen in an unpredictable manner, frequently changing 
speed and direction. The subject controls movement of the crosshairs via a 
j^y stick. The subject^s task is to move the crosshairs into the center of 
the target. When this has been accomplished, the subject must press a 
button on the response pedestal to "fire" at the target. The subject's 
score on a trial is the distance from the center of the crosshairs to the 
center of the target at the time the subject fires. The test consists of 
40 trials. A schematic depiction of these trials is presented in Figure 



Several item parameters were varied from trial to trial. These para- 
meters included the maximum speed of the crosshairs, the average speed of 
the target, the difference between crosshairs and target speeds, the number 
of changes in target speed (if any), the number of line segments comprising 
the path of each target, and the average amount of time requirec for the 
target to travel each segment. These parameters ire not all inoependent, 
of course. Moreover, the nature of the test creates a problem in charac- 
terizing some trials; a trial terminates as soon as the subject fire? at 
the target, so one subject may see only a fraction of the line segments, 
target speeds, etc., that another subject sees. 

Dependent Variables , Three dependent measures were obtained for each 
trial. Two were measures of firing accuracy: (1) the distance from the 



5-43 



212 



Target and 
Crosshair 
s. Appear 



No 




Path 



I 



I Yes 
I 

No Shoot 
d 



turn 



Nn 



Adjust Direction 


> 


f 






Press Response Button 



d ^ 
Hit/Miss 



Figure 5.13. Target Shoot Test. 



5-44 



ERIC 



2/3 



J 

center of the crosshairs to the center of the target at the time of firing, 
and (2) whether the subject "hit" or "missed" the target. The two were 
very highly correlated. However, the former provides quite a bit more 
Information about firing accuracy than the latter, so Distance was retained 
as the accuracy measure. Distances were averaged across trials to obtain 
an overall accuracy score. In some trials, the subject failed to fire at 
the target so no distance score was obtained; those trials were not in- 
cluded in the overall test accuracy score. 

The third dependent measure was a speed measure, representing the time 
from trial onset until the subject fired at the target. Again, trials were 
omitted if the subject failed to fire a shot. This last measure was not 
used in any subsecuent analyses, primarily because we had no clear Idea 
about how to view its relationship to the construct being measured on this 
test^ or to constructs measured on other tests. 

Test Characteristics . Table 5.12 presents data based on the 
Fort Lewis pilot test. The total time for this test averaged close to 4 
minutes, with about 1.6 minutes for instructions and 2.2 minutes for the 
test itself. In two or three trials, on the average, a subject failed to 
fire at the target. 

Split-half reliability across odd-even trials was .93 for Mean Dis- 
tance and .78 for Percent Hits. The average Percent of Hits was 58, with 
a range from 0 to 83. These results show that the Distance score is 
highly reliable and has adequate variance, and the Percent of Hits score Is 
acceptably reliable and also has adequate variance. Also, the 58 percent 
mean on this score shows that the test was at about the right level of 
difficulty. 

Analyses of variance were executed to determine the effects of several 
item parameters (crosshairs speed, average target speed, and average seg- 
ment length) on mean distance. All were found to be related to item 
difficulty. However, interpretation of these results was made difficult by 
the correlations among the parameters and by item order effects (i.e., the 
last dozen or so trials presented the most difficult tracking problems). 

Correlations With Other Measures . Correlations with other computer- 
administered tests exceeded .30 only for the two tracking tests (Table 
5-.3). The correlation was actually higher with Tracking Test 2 (r - .47 
versus .32 for Tracking Test 1), possibly Indicating that performance on 
the Target Shoot Test is influenced by multilimb coordination. The Target 
Shoot Test Mean Distance was relatively uncorrected with cognitive paper- 
and-pencil test scores (Table 5.4). The highest correlation was -.23, with 
the Maze Test. Thus, it was felt that the test was not heavily dependent 
upon any spatial -perceptual abilities. 

Modifications for Fort Knox Field Test . Because of Its high relia- 
bility and its independence from other ability measures, the test was not 
modified for Fort Knox field testing. 
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Table 5.12 

Pilot Test RasuUs From Fort Lewis: Target Shoot Test (N - 112) 



Descriptive Characteristics 
Time to Read Instructions (minutes) 
Time to Complete Test (minutes) 
Total Test Time (minutes) 
No. of Trials Without Firing* 

Dependent Measures MSM 
Distance^ 2.83 
Percent of hits* 58 





R^nq? 


.61 


.51 - 5.10 


.23 


1.81 - 3.29 


.68 


2.71 - 7.58 


3.97 


0 - 40 



SD Range ty^y^ 

.52 1.93 - 7.03 .93 

13 0-83 .78 



Mean 
1.58 
2.22 
3.80 
2.77 



* One subject failed to fire at any targets. Excluding this subject, 
mean, SD, and range for number of trials without firing were 2.43, 
1.78, and 0-8, respectively; mean, SD, and range for percent of hits 
were 59, 12, and 13-83, respectively. 

^ Spearman -Brown corrected split-half reliability for odd-even trials. 

^ Square root of the distance (horizontal pixels) from the center of 
the target to the center of the crosshairs at the time of firing, 
averaged across all trials in which the subject fired at the target. 
(There were a total of 40 trials or times on the test.) 
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MULTILIMB COORDINATION 



The multilimb coordination construct reflects the ability to coor- 
dinate the simultaneous movement of two or more limbs. This ability is 
general to tasks requiring coordination of any two lirdbs (e.g., two hands, 
two feet, one hand and one foot). The ability does Mi apply to tasks in 
which trunk movement must be integrated with limb moventents. It is most 
common in tasks where the body is at rest (e.g., seated or standing) while 
two or more limbs are in motion. 

In the past, measures of multilimb coordination have shown quite high 
validity for predicting job and training performance, especially for pilots 
(Melton, 1947). 

Target Track^^a Test 2 

Target Tracking Test 2 is modeled after a test of multilimb coordina- 
tion developed by the AAF, the Two-Hand Coordination Test. This test 
required subjects to perform a pursuit tracking task in which horizontal 
and vertical movements of the target-follower were controlled by two han- 
dles. Validities of this test for predicting AAF pilot training success 
were mostly in the .30s (Melton, 1947). 

Test Descriplion . Target Tracking Test 2 is very similar to the Two- 
Hand Coordination Test. For each trial jf Target Tracking Test 2, subjects 
are shown a path consisting entirely of vertical and horizontal lines. At 
the beginning of the path is a target box, and centered in the target box 
is a crosshairs. As the trial begins, the target starts to move along the 
path at a constant rate of speed. The subject manipulates two sliding 
resistors to control movement of the crosshairs; one resistor controls 
movement in the horizontal plane, and the other in the vertical plane. The 
subject's task is to keep the crosshairs centered within the target at all 
times. Figure 5.14 contains a schematic depiction of the test. 

This test and Target Tracking Test 1 are identical except for the 
nature of the required control manipulations. For Target Tracking Test 1 
crosshairs movement is controlled via a joy stick, while for Target Track- 
ing Test 2 crosshairs movement is controlled via the two sliding resistors. 
For the Fort Lewis battery, the same 18 paths were used in both tests, and 
the value of the crosshairs and target speed parameters was the same. The 
only other difference between the two tests was that subjects were per- 
mitted three practice trials for Target Tracking Test 2. 

Dependent Variable. The same dependent measure or score was used for 
this test as for Tracking Test 1 (i.e, the square root of the average 
within-trial distance from the center of the crosshairs to the center of 
the target, averaged across all trials). 

Test Characteristics , The 18 trials of the test (Table 5.13} reqi:ired 
9 minutes to complete. Since all subjects received the same set of paths, 
there was virtually no variability. Instruction time mean was 3.6. The 
range of total test time was from 11.5 to 15.5 minutes, with a mean of 12.7 
minutes. 
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FlQure 5.14. Target Tracking Test 2. 
Table 5.13 

Pilot Test Results From Fort Lewis: T arget Tracking Test ^ (N - 112) 



Descriotive Characteristics 


Mean 






Time to read instructions* 


3.58 


.68 


2.39 - 6.38 


Time to complete test* 


9.09 


.02 


9.03 - 9.13 


Total test time* 


12.67 


.68 


11.50 - 15.48 


Deoendent Measqr^j 


Mean 


se 


Range 


Distance** 


2.02 


.64 


0 - 4.01 



.97 

* Spearman-Brown corrected split-half reliability for odd-even trials. 

^ Square root of the average within-trial distance (horizontal pixels) from 
the center of the target to the center of the crosshairs, averaged across 
all 18 trials (or items) on the test. 
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Mean and standard deviation for overall accuracy score were 2.02 
and .64, respectively. As a result of the square root transforrnatlon, the 
distribution of accuracy scores was only slightly positively skewed. The 
Internal consistency reliability ot the accuracy score was .97. These 
results Indicate that Target Tracking Test 2 Is highly reliable as Is 
Target Tracking Test 1, and that it Is more difficult than Is Target 
Tracking Test 1 (mean Distance score for Target Tracking Test 2-2.02 
versus 1.44 for Target Tracking Test l--a difference of about one standard 
deviation). 

Four one-way analyses of variance were executed to determine the 
effects on tracking accuracy of average segment length, average time re* 
quired fo\ the target to travel a segment, maximum crosshairs speed, and 
difference between maximum crosshairs speed and target speed. All four 
Item parameters were significantly related to accuracy score, with cross* 
hairs speed accounting for the most variance and average segment length for 
the least* It should be noted again that all four parameters were highly 
Intercorrelated (the six Intercorrelatlons ranged from .37 to .87, with a 
median Intercorrelatlon of .52), and all four were also correlated with 
trial number (I.e., Items became more difficult as the test progressed). 
As a result, interpreting the results of these ANOVAs Is difficult. 

Correlatloi^s With Other Measures . Table 5,3 shows the correlations 
between the Target Tracking Test 2 and other computer*adm1n1stered mea* 
sures. The test was highly correlated with Target Tracking Test 1 (r 
« .76). Possible reasons for this correlation were discussed above (see 
Target Tracking Test 1). 

Given the high c relation with Target Tracking Test 1, it would be 
expected that Target i racking Test 2 would show a similar pattern of corre* 
lations wHh other computerized and paper-and-pencil ability measures. As 
Tables 5.3 and 5.4 show, this is essentially true. The only major dif* 
ference is that Target Tracking Test 2 failed to correlate significantly 
with mean Total Response '^ime from the Simple Reaction Time Test (r « .11 
versus r « .31 for Target Tracking Test 1). 

Modifications for Fort Knox Field Test . Changes in Target Tracking 
Test 2 for the Fort Knox mirrored those made for Target Tracking Test 1. 
Test trials were changed completely. Test development was directed by 
three item parameters- -number of segments, crosshairs speed, and difference 
between target and crosshairs speeds. The revised test includes 27 items. 
However, the items are no longer the same as those presented for Target 
Tracking Test 1, which should reduce the correlation between these tests. 
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NUMBER OPERATIONS 

This construct involves the ability to perform, quickly and accu- 
rately, simple arithmetic operations such as addition, subtraction, multi- 
plication, and division. 

The current ASVAB includes a numerical operations test, containing 50 
very "simple arithmetic problems with a 3-minute time limit. Because of low 
item difficulty and the speeded nature of the test, correlations with other 
ASVAB subtests indicate that Numerical Operations is most strongly related 
to Coding --a measure of perceptual speed and accuracy. The present mili- 
tary-wide selection and classification battery, then, measures very basic 
number operations abilities which appear very similar to perceptual speed 
and accuracy abilities. 

In the expert judgment process described in Chapter 1, this construct 
received a mean estimated validity of .40 with the high'^st value .44. The 
experts judged that this construct is an effective predictor of success in 
technical and clerical MOS. The authors, the scientific advisors, and the 
ARI scientists also thought that a computerized measure of this construct 
might prove superior to the paper-and-pencil format currently used. 

The test designed to assess number operations abilities was not com- 
pleted prior to the Fort Lewis pilot test, so no data are yet available to 
evaluate this measure. It has been prepared for administration as part of 
the test battery for the Fort Knox field test. 

Number Memory Tt»st 

TgSt PescrlPtlQtl. This test was modeled after a number memory test 
developed by Dr. Raymond Christal .at Air Force Human Resources Laboratory. 
The basic difference between the AFHRL test and the Number Memory Test 
concerns pacing of the number items. The former uses machine-paced presen- 
tation, while the latter involves self-paced item presentation. Both, 
however, require subjects to perform simple number operations such as 
addition, subtraction, multiplication, and division and both involve a 
memory task. 

In the Number Memory Test, subjects are presented with a single number 
on the computer screen. After studying the number, the subject is to push 
a button to receive the next part of the problem. When the subject presses 
the button, the first part of the problem disappears and another number 
along with an operation term, such as Add 9 or Subtract 6, then appears. 
Once the subject has combined the first number with the second, he/she must 
press a button to receive the third part of the problem. Again, the second 
part of the problem disappears when the subject presses the button. This 
procedure continues until a solution to the problem is presented. The 
subject must then indicate whether the solution presented is true or false. 

An example number operation item follows: 
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Item Set 
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Is 16 the correct answer? 



Response 



T 

White 



F 

Blue 



Figure 5.15 presents a flow chart for this test. 

Test items vary with respect to number of parts--four, six, or eight-- 
contained in the single item. Items also vary according to the delay 
between item part presentation or interstimulus delay period. One-half of 
the items include a brief delay (.5 second) while the other half contain a 
lengthier delay (2.5 seconds). The test contained 27 items. 

This test is not a "pure" measure of number operations, since it also is 
designed to bring short* term memory into play. We decided that this was 
the most efficient way to proceed, since a second measure of short-term 
memory was thought desirable, at least at this point in the project. 

Dependent Measures . Analyses planned for data that will be obtained 
from the Fort Knox field test administration include an investigation of 
the impact of item length (four, six, or eight) and interstimulus delay (.5 
second or 2.5 seconds) on reaction time and percent correct, as well as 
comparisons of mean reaction time scores for item parts requiring addition, 
subtraction, multiplication and division. These analyses will be used to 
identify the dependent measures for scoring subject responses in the field 
test. 
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MOVEMENT JUDGMENT 



Movement judgment is the ability to judge the relative speed and 
direction of one or more moving objects in order to determine where those 
objects will be at a given point in time and/or when those objects might 
intersect. 

Movement judgment was not one of the constructs identified and tar- 
geted for test development by the literature review or expert judgments 
described in Chapter 1. However, a suggestion by Lloyd Humphreys, one of 
our scientific advisors, and the job observations we conducted at Fort 
Stewart, Ft. Bragg, Ft. Bliss, Ft. Sill, and Ft. Knox, led us to conclude 
that movement judgment was likely to be related to job performance in a 
number of combat MOS (e.g., 16S, IIB, 19D). Therefore, we decided to 
develop a movement judgment measure to be included in the Fort Knox field 
test. 

Cannon Shoot Test 

The Cannon Shoot Test measures subjects' ability to fire at a moving 
target in such a way that the shell that is fired hits the target when the 
target crosses the cannon's line of fire. 

As part of its Aviation Psychology Program, the Army Air Force became 
interested in motion, distance, and orientation judgment and instituted 
development of a battery of motion picture and photograph tests (Gibson, 
1947). One of the AAF measures was called the Estimate of Relative Velo- 
cities Test, a paper-and-pencil test. Each trial consisted of four frames. 
In each frame, two objects (airplanes) were shown flying along the same 
path in the same direction. In each subsequent frame, the trailing plane 
edged nearer the lead plane. The subject's task was to indicate on the 
final frame where the planes would intersect. Validities of this test for 
predicting pilot training success averaged approximately .18 (Gibson, 
1947) . 

The present test was designed to test the construct that seems to 
underly the Estimate of Relative Velocities Test. 

Test Description , At the beginning of each trial, a stationary cannon 
appears on the video screen, with the position of this cannon varying from 
trial to trial. The cannon is "capable" of firing a shell, which travels 
at a constant speed on each trial. Shortly after the cannon appears, a 
circular target moves onto the screen. This target moves in a constant 
direction at a constant rate of speed throughout the trial, though the 
speed and direction vary from trial to trial. The subject's task is to 
push a response button to fire the shell in such a way that the shell 
intersects the target when the target crosses the cannon's line of fire. 
Figure 5.16 shows a flow chart for this test. 

Three parameters determine the nature of each test trial. The first 
is the angle of the target movement relative to the position of the cannon; 
12 different angles were used. The second is the distance from the cannon 
to the impact point (i.e., the point at which the target crosses the 
cannon's line of fire); four different distance values were used. Finally, 
the third parameter is the distance from the impact point to the fire point 
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Figure 5.16. Cannon Shoot Test! 
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(i.e., the point at which the subject must fire the shell in order to hit 
the center of the target); there were also four values for this distance 
parameter. 

If a completely crossed design were used, it would necessitate a 
minimum of 192 trials (i.e., 12 x 4 x 4 - 192). Instead, a Latin square 
design was employed, so that the version of the test for the Fort Knox 
field test includes only 48 trials. 

Dependent Measures . Three dependent measures are assessed on each 
trial. These include: (1) whether the shell hits or misses the target; 
(2) the distance from the shell to the center of the target at the time the 
target crosses the impact point; and (3) the distance from the center of 
the target to the fire point at the time the shell is fired. The Fort Knox 
data will be analyzed to determine which of these three measures is most 
reliable. Since the three will be highly intercorrelated, in the end it is 
likely that only one of the three will be retained as a dependent measure. 

Test Characteristics . Prior to the Fort Knox Field Test, only minimal 
preliminary data are available for this test since it was not part of the 
Fort Lewis pilot test. It appears that the test will take approximately 12 
minutes to complete, including instructions. It also appears that all 
three item parameters are related to item difficulty. That is, targets are 
more difficult to hit if the angle of the target is greater than 90% (i.e., 
the target is moving away from, rather than toward, the cannon), the impact 
point is far from the cannon, or the fire point is far from the impact 
point. Thus, targets that move rapidly are more difficult to hit than 
tho;->e that move slowly. However, all of these findings are based on 
observations of only a few subjects and are therefore tentative. 



5-55 

2?A 



SUMMARY 

Table 5.14 shows the means, standard deviations, and split-half reli- 
abilities for 24 scores computed from the eight computer-administered tests 
which were pilot tested at Fort Lewis. As referred to throughout this 
chapter. Tables 5.3 and 5.4 show the intercor relations between computer 
test scores, and the correlations between computer test scores and cogni- 
tive test scores. We make no further comment here since these data have 
been thoroughly discussed throughout the chapter. 

Investigation of Marhine fffprU 

One concern we had prior to the Fort Lewis pilot test was the extent 
to which computer measure scores would be affected by differences between 
testing stations. A testing station is one Compaq computer and the asso- 
ciated response pedestal; six such testing stations were used at Fort Le- 
wis. As we mentioned in Chapter 1, differences across testing apparatus 
and unreliability of testing apparatus had been a problem in World War II 
psychomotor testing and thereafter. The recent advent of microprocessor 
technology was viewed as alleviating such problems, at least to some de- 
gree. 

We ran some analyses of variance to provide an initial look at the 
extent of this problem with our testing stations. Thirteen one-way ANOVAs 
were run with testing stations as levels and computer test scores as the 
dependent variables. We ran separate ANOVAs for white males and non-white 
males in order to avoid confounding the results with possible subgroup 
differences. Also, only five testing stations were used since one station 
did not have enough subjects assigned to it. These results are shown in 
Table 5.15. 

u * ANOVAs, only one reached significance at .05 level, about 

what would be expected by chance. These results were heartening. (Note 
that the distance measures in Table 5.15 have not been converted to the 
mean square root units; these are the sums of the mean distances across all 
items.) . 

One reason for these results was the use of calibration software. 
This software adjusted for the idiosyncratic differences of each response 
pedestal, insuring a more standardized test administration across testing 
stations. ^ 

Pilot Test Results: Cotrnnt^nt s 

The results of the Fort Lewis pilot test of the computer-administered 
measures in the Pilot Trial Battery were extremely useful. The results 
showed very high promise for these measures in several ways: 

1. The battery proved to be basically self-administerina. The 
testing stations and battery software were successful in that 
almost every soldier could complete the entire battery with no 
assistance from the test monitor. 

2. Only one testing station experienced equipment problems during 
the week of testing, showing that fairly large-scale testing 

5-56 



ERIC 



225 



Table 5.14 

Means, Standard Deviations, and SpHt-Half Reliabnitv Coefficients for 
24 Computer Measure Scores Based on Fort Lewis Pilot Test Data (N - 112) 





. neari 


SO 


Rpi labll 


SIMPLE REACTION TIME (10 Items) 








Mean Decision Time (hs)^ 


29.25 


8.10 


•92 


M*an Tntal Rpartinn Timm fhs) 


WW* 7(> 


13.86 


94 


Trlnmed Standard Deviation (hs) 


1K79 


16.80 


.66 


Percent Correct 


99 


3 


-.01 


CHOICE REACTION TIME (15 Items) 








Mean Decision Time (hs) 


36.78 


7.75 


•94 


Mean Total Reaction Time (hs) 


65.98 


10.39 


•91 


Standard Deviation (hs> 


8.92 


3.75 


.10 


Percent Correct 


99 


3 


-.16 


DIFFERENCE IN SIMPLE & CHOICE REACTION TIME 








Decision Tine (hs) 


7.68 


8.79 


.86 


Total Tine (hs) 


10.37 


11.15 


.79 


SHORT-TERM MEMORY (50 Items) 








Intercept (hs) 


97.53 


30.28 


.84 




7.19 


6.14 


•S4 


Percent Correct 


90 


10 • 


.95 


.nd Mean (hs) 


119.05 


29.84 


.88 


PERCEPTUAL SPEED & ACCURACY (80 Items) 








Intercept (hs) 


89.37 


36.48 


.85 


Slope (hs) 


33.14 


9.78 


.89 


Percent Correct 


87 


8 


.81 


Grand Mean (hs) 


294.22 


57.13 


.97 


TARGET IDENTIFICATION (44 Items) 








Mean Total Tine (hs) 


218.51 


68.75 


.97 


Percent Correct 


93 


8 


.78 


TARGET TRACKING 1 (18 Items) 








Mean Distance (n^/m pixels )^ 


1.44 


.45 


.97 


TARGET TRACKING 2 (18 Items) 








Mean Distance (n\/m pixels } 


2.01 


.64 


.97 


TARGET SHOOT (40 Items) 








Mean Total Distance (mym pixels ) 


2.83 


.52 


.93 


Percent •Hits" 


58 


13 


.78 



* Odd-even Item correlation corrected to full test length with the 
SpeanDan-8rown formula. 

^ hs « hundredths of seconds. 

^ m^a pixels - mean of the square root of the neaii distance from target, 
cosputed across all trials. 
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I Table 5.15 

Results o f Analyses of Variance for Machine Effects: 
White and Non-White Hale s. Fort Lewis Sample 







w&it 


e_Jlftl^ 






Hon-White 






H 


Was TV 

Mean 


5S, 


ra 


fi 


Mean 


£12 


r* 






58.29 


30* 17 


0.79 


26 


58,58 




0.22 


Rtaction Tia« 2: Percent Correct 


45 


98.91 


2*57 


1.13 


26 


97.84 


3.29 


1.14 


Reaction Time 2: Total RT (hsec) 


45 


63.22 


8.57 


0.35 


26 


67.58 


12.45 


2.43 


Wasat^* 9A'T*<^*n4* /*i>^«>^Atf"»^ 

nmm%j£j • jrex^cenu correc^ 


45 


90.89 


5.75 


0.46 


26 


85.54 


12.82 


0.51 


MeBorv: Grand MAiin ^>>AAr^^ 




110. 13 


22.45 


0.16 


26 


118.00 


30.38 


1.49 


PerceDtual Soeed & XncwrMcv* 
Percent correct 




85 .84 


5 • 85 


0.75 


26 


79.50 


9.91 


0.29 


Perceptual Speed 6 Accuracy: 
Grand Mean rhsec) 


45 


287.96 


53.92 


0.94 


26 


274.58 


73.93 


0.45 




45 


94 . 00 


4.60 


0.21 


26 


90.54 


9.46 


0.87 


Identification: Total RT (hsec) 


45 


190.02 


49.24 


1.13 


26 


208.62 


57.67 


1.59 


Tracking 1: Distance 


45 


1548.31 


458.60 


1.41 


26 


2608.58 


1567*33 


0.42 


Tracking 2: Distance 


45 


3410.29 


1864.34 


2.61* 


26 


5161.27 


2740.69 


1.42 


Target Shoot: Percent Hits 


45 


63.22 


9.35 


0.37 


25 


58.88 


10.75 


0.10 


Target shoot: Distance 


45 


789.71 


153.93 


0.44 


25 


915.12 


311.22 


0.22 



• Degrees of freedoe - 4, 40; F for ilphi - .05 is 2.60 
^ Degrees of frtmiom - 4, 21; F for ilphi - .05 Is 2.87 
Q ^ Slgnlflcifit it m ^05 
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with portable computer equipment is feasible. 



3. The measures showed acceptable psychometric properties, al- 
though there was definitely room for i/nproyement in several 
cases. The analyses were instructive for making these 
changes . 

4. The soldiers liked the test battery. Virtually every soldier 
expressed a preference for the computer-administered tests 
compared to the paper-and-pencil tests. We thought there were 
several reasons for this attitude: novelty; the game-like 
nature of several tests; and the fact that the battery was, in 
large part, self- paced, allowing each soldier to thoroughly 
understand the instructions and to work through the battery at 
his/her own speed. 
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CHAPTER 6 



PERCEPTUAL/PSYCHONOTOR COHPUTER-ADHINISTERED MEASURES; FIELD TEST 



Jeffrey J. HcHenry, Jody L. Toquam, Rodney L. Rosse, 
Norman G. Peterson, and Hatthew K. McGue 



In this chapter we describe analyses of the field test of the percep- 
tual /psychomotor computer-administered measures In the Pilot Trial Battery, 
administered at Fort Knox In September 1984. The procedures and sample for 
this field test were described In Chapter 2, and th2 development and pilot 
testing of the computer-administered portion of the battery were described 
In Chapter 5. We note here that portions of this chapter are drawn from 
HcHenry and HcGue (1985) and Toquam, et al. (1985). 

We present descriptions of the tests and discuss scoring Issues and 
decisions. Descriptive statistics, reliability estimates, and uniqueness 
estimates for dependent measures or test scores are shown. The analyses of 
effects of video-game experience, computer testing station and practice on 
test scores are presented. Finally, the covarlance of computer-adminis- 
tered test scores with each other, with the cognitive paper-and-pencil 
measures in the Pilot Trial Battery, and with ASVAB scores are presented. 



PERCEPTUAL/PSYCHOMOTOR COMPUTERIZED TESTS ADMINISTERED 

A concise description of each of the computer-administered tests 
included in the Pilot Trial Battery, along with a sample item or items from 
each test, is contained in Figure 6.1. Copies of the full Pilot Trial 
Battery admiriistered at Fort Knox are contained in Appendix 6. As Figure 
6.1 shows, there are ten computer-administered tests in the Pilot Trial 
Battery, and these tests were intended to measure six constructs: Reaction 
Time, Perceptual Speed and Accuracy, Memory, Movement Judgment, Precision/ 
Steadiness, and Multilimb Coordination. 
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COXSTlUCT/HEASUlf 



mCRtHION OF TfST 



SANnf I TIN 



REACTION TINE 

Slaple RsicKon T<m 
(R«ict<Ofl T<M Ttst 1) 



Tht tubjtet Is Inttructtd to 
pltet bU/htr htndt on tht grttn 
"heat" buttons or In tht Rttdy 
position. Uhtn tht tubjtet*t 
hsndt trt In tht Rttdy position, 
s smU box tppttrt on tht 
tcrttn. Afttr • dtlty p«rlod 
which vtritt froa 1.5 to 3.0 
tteondt, tht word YfLLOW tppttrt 
In tht box. At thit point, tht 
tubjtet Mutt rtaovtt hit/htr 
prtftrrtd htnd fre« tht "host" 
buttont to ttrlkt tht Ytllow kty 
on tht ttttlng ptntl. Tho 
tubjtet autt thtn rtturn hIt/htr 
htndt to tht rttdy potltlon to 
rteolvt tht noxt Itta. Tht tttt 
eonttint 15 Ittk*. Although ft 
It ttlf*ptetd, tubjtett trt 
Ifvtn 10 tteondt to rttpond 
btfore tht eoaputtr tf««*outt 
tnd proptrtt to prtttnt tht ntxt 
Itta. 




Choice Rcictlon TIm 
(Reaction Tine Test Z) 



Cholet rttetlon tiM It ettttttd 
for two rttpontt tlttrnttfvtt 
only. ThIt attturt It obttlntd 
\n virtually tht ttat aenntr tt 
the tfaplt retetlon tiae 
aetture. The aejor differenee 
involvet ttlaulut pretentttlon. 
Rtther thin prtsenclni the tt«e 
ttlaulut (YELLOU) on eteh trial, 
the ttlaulut vtrfet. Thet It, 
tubjeett aty tee either of the 
ttlaull HUE or UNITE on the 
eoaputer tereen. Uhen the 
ttlaulut tppeert, the tubjeet It 
inttrueted to aove hft/her 
preferred htnd froa the "hoae" 
keyt to ttrike the key thtt 
eorrespondt with the tera (HUE 
or UNITE) appetrinf on the 
tereen. ThIt tttt eonteint IS 
Iteat. Although the tett It 
telf*pteed, the eoaputer U 
proirtaaed to ellou the su^jeet 
9 teeondt to rotpond btfore 
goln9 on to the next Itea. 




2'M 



ERIC 



6 



1. Description of Perceptual/Psychomotor Computer-Administered Measures 
in Field Test. (Page 1 of 8) 



CONSTIUCT/NEASURE 



OESCXtPTION OF TEST 



SAHPIE ITEM 



''CRCEPTUAL SPEED AND ACCURACY 



Perceptual Speed 
and Accuracy Test 



This test is designed to aeasure 
the ability to coiipare rapidly 
tuo visual stiauli presented 
simultaneously and deternine 
whether they are the saae or 
different. At the besinning of 
taeh triel« the subject is 
instructed to hold down the hoae 
keys. After a brief delay, the 
itinuli ere presented. The 
subjec nust decide whether 
the stimili ere the saae or 
different. He/she aust then 
depress e white button If the 
stiauli are the saae or a blue 
button if the stiauli are 
different, four different 
"types" of stiauli are used: 
alphai nuaeric, symbol iC| and 
words. Within the alpha, 
n'jeric, and syabolic stiauli, 
^he length of the stiaulus Is 
varied. Three different levels 
0/ length are presented: two- 
character, f ive-chnracter, and 
nine-character. The test 
consists of trials. The 
prinary dependent variable is 
the s'^bject's average response 
tiae across all trials In which 
the subject aakas a correct 
response. 




Figure 3.1. Description of Perceptual/Psychomotor Computer-Administered Measures 
In Field Test. (Page 2 of 8) 
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CONSTKUCT/MEASUKC 



OeSCKl^TION Of TEST 



ITEM 



PERCEPTUAL SPEED AWO ACCURACY 
(Continued) 

Target Identificitlon Test 



This test lies designed to be e 
Job*relevent neesure of 
perceptuel speed end eccurecy. 
In this te^t, the subject U 
presented with e terget object 
end three stiaulus objects. The 
objects ere pictures of nilitery 
vehicles or eircreft (e.g., 
tenks, pleneu. helicopters), 
The terget object is the se«e 
es one of the stiaulus objects. 
However, the terget ney be 
roteted or reduced in size 
reletive to its stiaulus 
counterpert, or the terget aey 
be •'Moving* tnd growing ecross 
the screen. The subject nust 
deternine which of the three 
stiMulus objects is the ssmo es 
the terget object tnd then press 
e button on the response 
pedestel corresponding to thet 
choice. The test consists of A8 
ite«s; 24 ere stetionery, 24 ere 
■oving. The prinery dependent 
verieble is the subJect^s 
everege response xtiMe ecross ell 
triels in which the subject 
Mekes e correct response. 
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Fig-re 6.1. OescHption^of f-«Pt„,1/P|ycho™ot.r Co^puter-Ad^lnistere. Measures 



PRir 



if 

[ 



CMttiuer/NCAfuti 



HSCH»T!0« Of TIST 



HEMOAY 

Short'Tm Mtaory Tttt 



Nuiibtr nmoty Ttst 



NOT IM (tiut) 



Figure 6.1. 




At etaputtr eeMolt» tht 
sulilm fs fnscruectd co pltet 
klf/ktr ktfHs •n ch« frt«n heat 
^uccsns. Tht ffrsc scfautus ft 
ch«n tppttl't on cht tcrttn. A 
sclMlut eoACtInt ont, chrtti or 
fivt objtect (Itccort or 
sysbols), ftUowInf t dolay 
porlod, cht tclauius ttc 
dUsppttrt. Vhtn tht probt 
tppttrs, tht iubjtec autc dteldt 
tiMchtr tt not It wit ptrc of 
tht tclmlut ttc* If cht probt 
utt prtttnc In tht iCiMlui itc, 
cht tubjtec mitc tcrlkt cht 
uhlct kty on cht rttponst 
ptdttctl. If cht probt wtt not 
prtttnc, cht tubjtec Mtc ttrlkt 
Cht blut kty. Tht cttc Ineludtt 
48 fctM. Tht prlMry dtptndtnc 
vsrltblt It cht tubjtec't 
tvtrtft rttpontt cfao terott 
chott crItU <n uhUh cht 
tubjtec atktt t eorrtec 
rttpontt* 

At cht boilnolni tf tteh crftl 
of chit cttc, cht tubjtec U 
prttenctd titch > tln«U nvabtr 
on cht eoapoctr terttn. Afttr 
tcudyinf tht muU^tr, cht tubjtec 
ft Inftrucctd co puth t buccon 
CO rte^lvo CRt ntxt ptrc ef cht 
pfoblttt. Vhen tht subjtec 
^rtt4tt ttiO Jiucccftt cht firtc 
ptrc t}^ cht pr4b(ta dlt^ppttrt 
tnd ftn^cfctr nom^Jtr tppttrt tlonf 
vich <bn tptrtriiA ctra (t.8*i 
^Add fubcrtet 6'>« Oneo 

the subject htt 6oablnt4 cht 
i\rtZ nuabtr ulch tht eeeond, 
bt/th^ autc prttt t bucttn co 
rde^l'ft t ntu nuabtr tnd 
•por-)CfM ctra« Thit prtttdurt 
•trr.mitt uncfl t ttlucion co 
thr prtUlta ft prtttni^td. Tht 
tu^Jttc autc chtn fndletct 
uhtchtr tht ttluclon prtttnctd 
ft eorrtec tr Inctrrtec. In 
ttctl, tht Cttc ttntUcs i*f 27 
tucU Ictat. 

Description of Perceptual/Psychomotor Computer-Administered Measures 
in Field Test. (Page 4 of 8) 
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I eta ttc 



^robt 

IM (Vhlct) 



3t»rc with 14 
Divldt bv 7 
Multiply by « 




CONSTXUCT/MEASURC 



DESCRIPTION or TEST 



SAMPLE ITEM 



HOVEHENT JUDGMENT 

Cannon Shoot Test 



cn 
I 

cn 



At tha btgfnnfng of tteh trltl 
of this tost, • stitfontry 
eonnon ippiort on tho eoaput^r 
eonsoli. Thi storting position 
of this csnnon vorlss fro* trlsl 
to trial (l.i., It Is positioned 
on thi top« bottom, or sids of 
tht scrttn). Tht csnnon Is 
espsbli of firing • thsU. Ths 
shiU trsvils it s constant 
sptsd on itch trlsl. Shortly 
•ftsr ths csnnon sppoarSf t 
circulsr tsrgst aovss onto ths 
scrssn. This tsrgst aovss In s 
constant dirsctlon st s constsnt 
rsts of spssd throughout ths 
trlsl, though ths spssd and 
direction vary fro* trial to 
trial. Tha subjsct's tssk Is to 
push s rssponss button to firs 
ths shsll such t>.st ths shsll 
Intsrsscts ths tsrgst whsn ths 
tsrget crossss ths shsll's llns 
of firs. Ths tsst fncludss AS 
Iteiis. Ths priiisry dspsndsnt 
vsrlsbls Is s dsvistlon scors 
indlcsting ths diffsrsncs 
bstwssn tiMS of firs snd optical 
firs tias (s.g.f dfrtct hits 
yfsld s dsvistlon scors of zaro.) 
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CONSTKUCT/MEASUftC 



DfSCXIPTIOli or TEST 



SAMPLE ITEM 



PRECISION/STEADINESS 

Target Tracking Test 1 



I 



This It a pursuit tracking t*st. 
On each trial of the test, sub* 
Jtets art shown a path consist* 
ing tntirtly of vtrtietl tnd 
horizontal lint stg«tnts. At 
tht btglnning of tht path a 
targtt box. Ctnttrtd in tht box 
It a erotthalr. At tht trial 
btgint, tht targtt ttartt to 
■ovt along tht path at a eon* 
ttant ratt of tpttd. Tht sub* 
jtet*s task la to kttp tht 
crosshair centtrtd within tht 
targtt at all ti»ts. Tht sub* 
Jtct usts a Joystick to control 
•ovtMtnt of tht crosshair. Tht 
sub]tct*s scort on this ttst is 
tht avtragt distanct fro« tht 
ctnttr of tht crosshair to tht 
ctnttr of targtt across all 27 
ttst trials. 




Figure 6,1, Description of Perceptual/Psychomotor Computer-Administered M'^isures 
in Field Test, (Page 6 of 8) 
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CONSTtUCT/NeASUXC 



DESCKIfTION OF TEST 



PKECISIOH/STEADINESS 
(eont fnutd) 



SANPLE ITEM 



Ttrgtt Shoot Ttst 



I 

00 



2iu 



At tht btgfnnfng of t trial on 
thfs ttst, t erotthtir appttrs 
in tht etnttr of tht serttn tnd 
t ttrgtt box appttrs tt somt 
othtr loettlon on tht serttn. 
Tht ttrgtt thtn btgfns to movt 
about tht serttn fn tn unprt* 
dlettblt atnntr, frtqutntly 
<:htnglng spttd tnd dirtetfon. 
Tht subjtet etn eontrot aovtntnt 
of tht erosshtfr usfng t Joy 
stfel(« Tht subjtet*s ttsk fs to 
■ovt tht erosshtir into cht etn* 
ttr of tht ttrgtt. uhtn this 
hts bttn teeoapitshtdi tht sub* 
Jtet aust prtss t rtd button on 
tht rtsponst ptdtsttl to "flrt** 
tt tht ttrgtt. Tht subjtet aust 
do this btfort tht ttat Halt on 
tteh trttl Is rttehtd. Tht sub* 
Jtet rtetivts thrtt seorts on 
this ttst. Tht first Is tht 
pthetnttgt of "hits" (l.t., tht 
subjtet firts tt tht ttrgtt uhtn 
tht erosshtir is insldt tht ttr* 
gtt box). Tht steond Is tht 
tvtrtgt tiat tltpstd froa tht 
btglnning of tht tritl until tht 
subjtet firts tt tht targtt. 
Tht third seort is tht tvtrtgt 
disttnet froa tht etnttr of tht 
erosshtir to tht etnttr of tht 
ttrgtt at tht tiat tht subjtet 
firts tt tht targtt. Tht ttst 
eonsists of 35 tritls. 
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Figure 6... 0«cr,pt,» of P-ceptual/Psycho».tor C»puter.Ad.,„Utered He«ures 



CONSTXUCT/NEASUKC 



D£$CKIFTI0N OF TCST 



SANPII ITCN 



HULTIIIHB COORDINATION 

Target Tracking Test Z 



I 

vo 



This is t test of nultilinb co- 
ordinetion. The test is virtu- 
ally identicel to Target Treck- 
ing Tijt 1. The only difference 
fs thet the subject nust use tuo 
sliding resistors (insteed of e 
Joystick) to control noveaent of 
the crosshair. The first slid- 
ing resistor controls aoveaent 
of the crossheir In the verticel 
plsne, uhlle the second sliding 
resistor controls aoveaerit of 
the crossheir In the horixontel 
plene. As ulth Terget Trecking 
#1# the subject's score on this 
test is the everege distence 
froM the center of the crosshair 
to the center of the terget 
ecross ell 27 test triels« 




Figure 5.1. Description of Perceptual/Psychomotor Computer-Administered Measures 
in Field Test. (Page 8 of 8) 
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ANALYSIS OF DATA FROM FIELD TEST ADMINISIRATION 



Table 6.1 shows means, standard deviations, and reliability estimates 
for 19 scores or dependent measures for the 10 computer-administered tests. 
Before discussing this table and other aspects of the field test data 
analysis, we make a few remarks about the methods used to score these 
tests. In general, the methods employed were similar to those used at Fort 
Lewis (described In Chapter 5), but analyses of the Fort Knox field test 
data occasionally Indicated a change was desirable. 

Field Test Scoring Procedurps 

The perceptual computer- administered tests (see Table 6.1) generally 
yield one or both of two types of scores: accuracy and speed (except for 
the Cannon Shoot Test, discussed later) --for example, percent of items 
correct (accuracy) and mean reaction time (speed) on Perceptual Speed and 
Accuracy. 

In addition, two derived measures can be computed for the perceptual 
tests: the slope and the intercept obtained when reaction times are re- 
gressed against an impoi'tant defining characteristic of test items (which 
we called a "parameter"). For Perceptual Speed and Accuracy, this charac- 
teristic was the number of stimuli or characters being compared in an item 
(I.e., 2, 5, or 9 characters). In terms of speed of processing, the slope 
represents the average Increase in reaction time with an increase of one 
character in the stimulus set; thus, the lower the value, the faster the 
comparison. The Intercept represents all other processes not involved in 
comparing stimuli, such as encoding the stimuli and executing the response. 
Of course, these two measures can be used only when the test is well enough 
understood to allow the appropriate construction of items to tap a defining 
characteristic or parameter. 

.Reaction times on all tests were computed only for correct responses 
because it seemed to make very little sense to include incorrect responses. 
Subjects could simply respond at random and receive an excellent reaction 
time score if Incorrect responses were Included. This strategy means that 
Items on most tests should be constructed so that subjects could answer 
every item correctly if given enough time, and that enough time is given. 
We did follow this strategy. Consequently, the speed measures (reaction 
time) were expected, in general, to have more variance and be more mean- 
ingful than the accuracy measures. 

Several Issues revolved around the choice of the particular way to 
measure reaction time. As noted in Chapter 5, total reaction time is made 
up of two components, decision time and movement time. Analyses of Fort 
Knox field test data indicated that total reaction time and decision time 
were very highly correlated and, since movement time is conceptually unin- 
teresting, we elected to use total reaction time for all reaction time 
tests. 

Means or medians across items could be used to compute the total 
reaction time scores. These could be trimmed (i.e., highest and lowest 
items not Included in the calculation) or untrimmed (all items included). 
We locked at score distributions, intercorrelations of the various scores, 
and reliabilities of the scores in order to decide which method to use. 
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Table 6.1 



Characteristics of the 19 



Dependent Measures for Comouter- 
Knox Field T ests (N - 2S6)* 



Administered Tests: Fort 



Dtotntknt Wewure 



JtSSQ. 



SpUt-Half Tttt-RatMt 



Sl^pU Miction Tint (SRT) 



NMO RMCtfon l\m (RT) 


56.23 ht 


18.83 hs 


.90 


.37 


Oiofe* XtMtlon l\m (OtT) 










NMO Reaction Tint (RT) 


67.41 hs 


10.20 ha 


.89 


.56 


P«rcc|>tuil Spttd and Accuracy (PS & A) 










Ptrctnt Correct (PC) 


88X 


8X 


.83 


.59 


Prmt Reaction Tine (RT) 


325.61 hs 


70.38 hs 


.96 


.65 


Slope 


Urn 


15.56 ns/cn 




.67 


Intercept 


67.96 hs 


45.02 hs 


.74 


.55 












Percent Correct (PC) 


90X 


10X 


.84 


.19 


Mean Reaction Tiae (RT) 


528.70 hs 


133.96 hs 


.96 


.67 


Sfiort'Tera Heaorv (STN) 










Porf^mnt Correct fPC) 


85X 


8X 


.72 


.34 


Mean Reaction Tine (RT) 


129.68 hs 


23.84 hs 


.94 


.78 


Slope 


7.22 ht/ch 


4.53 hs/ch 


.52 


.47 


Iffit#reeot 


108.12 hs 


''3.18 hs 


.84 


.74 


Ntaber Nenory 










Percent .Correct 'PC) 


83X 




.63 


.53 


Mean Operation Tine (RT) 


230.71 hs 


TIM hs 


.95 


.88 


Cannon Shoot 










Tine Error (TE) 


78.60 hs 


20.28 hs 


.88 


.66 


PSYCHOMOTOR 










Target Track 1 






.97 


.68 


Mean Log Distance 


3.22 


.44 


Target Shoot 






.91 


.48 


Mean Tine to Fire (atd) (TF) 


-.01 


.48 


"tean tog Distance (std) 


-.01 


.41 


.86 


.58 


Target Track 2 






.97 


.68 


Mean log Distance 


3.91 


.49 



* N varies slightly fron tett to test. 

N « 120 for test-retest reliabilities, but varies slightly fron test to test." r^ « 
split-half reliability; odd-even itcn correlation with Speaman-Brciin correction, r^^ 
« test-retest reliability, two week interval between adninistrations. 

^ hs " hundredths of a second 

^ hs/ch « hundredths of a second per character. 
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Generally, there were no striking differences between the methods. We 
decided to use untrimmed means for all tests except Simple and Choice 
?hfci ?! ' fl"^^! ^^Jr^"* ""^^ »^*«ct the mean much more for 

iJ fJni!^ H^^^ i^^^"^ •'^^^"^^ ^''s- ^sd a much smaller number 

JLhl? i4*'u??f*!!'®^® selected over r,cdians because they had slightly 
higher reliabilities. 

scoring issue concerns missing data. Since a subject may not 
get all items correct on a particular test, some information is missing 
fnt\l t f total reaction time, slope, and intercept are being computed 
fJlJ*'? f^'^'^^^i^ Therefore, we established a maximum number of missing 
^Jff permitted for each test. This limit for all tests, 

!'*^uP?^°"„°^ ^^^^^^ Memory, was set at 10 percent. Hence, for 
Simple and Choice Reaction Time, subjects could miss up to two items; for 
Short-Term Memory, Perceptual Speed and Accuracy, and Target Ident-fica- 
:„m:J J^"^* Because Number Memory requires 

pr-/ide several responses for a single item, the possibility of 
JiffiJHi^'^i* ^"^"'•^ sufficient numbers of subjects were 

fJoIl*?l%r-'' permitted subjects to miss up to seven of the 27 

Items in tnis test. 

The Percent Correr \ and Mean Operation Time scores for the Number 

JnJJTJfc* ''^''"Ir V'*"\*I°" test was -.ot idmiSiltered Jt 

EorLiri therefore, these scores were not discussed in Chapter 5. 

Percent Correct is simply the porcentage of items tha. the subject answered 
correctly. Mean Operation Time is the mean of the mean reaction timSs to 
the four arithmetic operations (multiply, divide, add. and subtract) That 
IS, for each subject, a mean reaction time for processing all the multipli- 
cation operations was computed; a separate mean for all the division ojeri- 
tions, and so on for the two other operations. The mean of these four 
operation reaction time means wa. then computed and labeled Mean Operation 

far^a/lJll *'^°!!^: procedures for scoring the Cannon Shoot Test dif- 
fered from those used to score the other cognitive/perceptual tests A 
JS?rH°;K*^"^Kf inappropriate because thelask're- 

h?t on It to ascertain the optimal time to fire to ensure a direct 

hit on the target. (See description of Cannon Shoot Test, Figure 6.1.) 
Therefore, responses on this measure were scored by computing a deviation 
score that s composed of the difference between the tiSe the Jubjeit f??ed 
and the optimal time to fire. These scores are summad across all Uems ?or 
each subject and a mean deviation time score Is computed. 

Scoring of two of the three psychomotor tests, Tarqet Trackina Tests i 
frL\T' ^t'-^^Shtforward. During each t^^, ihe d StaJcf 

lJS?oifL?pfi'!'fi°r^i5c r^^hair to the center of the target was computed 
approximately 16 times per second, or almost 350 times per trial These 

disiaScrfrLS';.iar^'*^ ''^'^'^ ^^p"*" ^^''^ 

However, the frequency distribution of these mean distance scores 
proved to be highly posit-vely skewed, the skewness coefficient fS? some 
IVr^tll 5®^"9 in excess of 5 and 6. Therefore, subjects' mean distance 

f^^ftMr *u^;^?T "smg the natural logarithm trans- 

formation. The overall test score for each subject was then the mean of 
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the log (mean distance) scores across the 27 trials of each test. 

Scoring of the Tsr^et Shoot Test was a bit more complicated. Three 
overall test scores were generated for each subject: (1) the percentage of 
hits; (2) the mean distance from the center oi' the crosshair to the center 
of the target at the time of firing (the distance score); and (3) the mean 
time elapsed from the start of the trial until faring (the time-to-fire 
score). Percentage of hits was a less desirable measure because It con- 
tains relatively little Information compared to the distance measure. 
Complications arose because subjects received no distance or t1me-to*f1re 
scores on trials where they failed to fire at the target before the time 
limit for the trial elapsed. This scoring procedure resulted In consider- 
able missing data; moreover, the missing data occurred primarily on the 
most difficult items of the test, where only the adept subjects were able 
to maneuver the crosshair close enough to the target to fire. 

Therefore, as a first step in computing overall distance and time-to- 
fire scores for the Target Shoot Test, the distance and ttme-to-fire scores 
for each trial were standardized. That is, the mean and standard deviation 
of the distance score was computed for each item or trial on the test. 
Then, each subject was assigned a standard score on each trial by sub- 
tracting the item mean from his/her obtained distance score and dividing by 
the item standard deviation. For each subject, the overall distance and 
time score was then computed by averaging these standardized scores across 
all trials in which the subject fired at the target. 

Mean Scores and Reliability Estimates 

The means and standard deviations in Table 6.1 provide information 
about the score distributions. Note that the Percent Correct scores for 
Perceptual Speed and Accuracy, Target Identification, and Short-Term Memory 
are high, and the standard deviations are not large as had been expected. 
The Reaction Time scores for these tests do have sufficient variance. 

The split-half reliabilities range from .52 (Short-Term Memory Slope) 
to .96 (for two scores). Besides the Short-Term Memory Slope, only the 
Number Memory Percent Correct score is undesirably lov? (.63). All others 
are .74 or higher. These split-half reliabilities are odd-even correla- 
tions corrected to full test length, but note that they do not iiuffer from 
the arti factual inflation that speeded paper-and-pencil measures do. This 
is because ill items are attempted by every subject. 

The test-retest reliabilities are lower than the split-half reliabili- 
ties, as is typically the case. Three are so low as to cast doubt on the 
usefulness of the score: Simple Reaction Time Mean Reaction Time (.37), 
Target Identification Percent Correct (.19), and Short-Term Memory Percent 
Correct (.34). However, the two Percent Correct scores are not viewed as 
the primary score for their tests, and Simple Reaction Time is viewed 
largely as a "warm up" test. Although seven of the other scores have test- 
retest reliabilities below .60, there appears to be sufficient stability in 
these scores to warrant their possible use as predictors. 
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Uniqueness Estimates of Cotnouter-Ad mlnistered Test Scores 



Table 6.2 shows uniqueness estimates for the 19 scores when regressed 
against the ASVAB subtests and the other computer- administered scores. The 
pattern of rssuUs here is slmnar to that found for the cognitive paper- 
and-pencil tests, except that the computer-administered tests have even 
higher U' ccafflclents, and thus show promise for adding to the validity 
obtained by the ASVAB. The exceptions are the Number Memory Scores. The 
two scores have lower uniqueness for ASVAB than for other computer tests. 
Several ASVAB subtests measure arithmetic and mathematical ability (Arlth- 
■etlc Reasoning, Number Operations, and Mathematical Knowledge) and the 
Number Memory Test requires the use of the four basic arithmetic opera- 
tions, so this finding. In retrospect. Is not too surprising. 

Later In this chapter we present the results of a factor analysis of 
the computer-administered test scores and the ASVAB sub-test scores which 
give additional Information about the overlap between these two sets of 
tests. 

Correlations with Vid eo Game-Plavino Experience 

Table 6.3 shows correlations of the 19 computer-administered test 
scores with the. subject's previous experience playing video games. In the 
computer- administered tests, the questions was asked: "In the last couple 
years, how much have you played video games on arcade machines, home video 
games or home computers?" Subjects selected one of the following five 
answers: "You have NEVER played video games," "You have tried a few games, 
but have generally played less than once a month," "You have played several 
times a month," "You have played at least once or twice a week," "You have 
played video games almost every day." These answers were given numeric 
values from 1 to 5, respectively. The mean score on this question was 
2.99, SO - 1.03 (N - 256) and the test-retest reliability was .71 (N - 



Nine of the 19 correlations reached statistical significance at 
the .05 level. Including three of the four scores from the psychomotor 
tests (Target Tracking 1 and 2 Mean Log Distances and Target Shoot Mean Log 
Distance). The Cannon Shoot score also showed a statistically significant 
correlation. Perceptual Speed and Accuracy, Target Identification, and 
Number Memory test scores showed no significant correlations, although 
Short-Term Memory did. The correlations are fairly low in general; the 
highest one Is .27 with Target Shoot Mean Log Distance. 

He Interpret these findings as showing a small, but significant, 
relationship of video game-playing experience to the more "game-like" tests 
In the battery (i.e., the psychomotor tests), and a smaller, probably not 
meaningful, relationship with the cognitive/perceptual kinds of tests (with 
the possible exception of Short-Term Memory). 

Effects of Differences in "Machine" or Comnutt^r Testing Statio n 

He repeated the investigation which had been done at the pilot test at 
Fort Lewis on the effect of machine or computer testing station differences 
on computer-administered test scores. There were six computer testing 
stations in the field test, and approximately 40 male soldiers had been 
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Table 6.2 



Unioueness Estimates for the 19 Scores on Computer-Administered Tests In the 
Pilot Trial Battery Aoainst Other Computer Scores and Against ASVAB 



Score 


Rell ability 
Split 'Half Test-Retest 


ASVAB 

R^ Uit^ 
ASVAB 


«c 
/ 


Other Comouter Tests 

nr with 

Coavuter ^2^ 
Scores 


Slnplt Rtactfon J\m 














ttoan Rtactfon Tfme 


.90 


.37 


.07 


.83 


.35 


.55 


Choic« Reaction Time 
















.89 


.56 


.09 


.80 


.44 


.45 


Ptrctptual Speed and Accuracy 














Percent Correct 


.83 


.59 


.14 


.69 


.42 


.41 


Mean Reaction Tfne 


.96 


.65 


.06 


.90 


.40 


.56 


Slope 


.88 


•67 


.09 


.79 


.29 


.59 


Intercept 


.74 


.55 


.11 


.63 


.19 


.55 


Target Identification 














Percent Correct 


.8A 


.19 


.05 


.79 


.25 


.59 


Mean Reaction Time 


.96 


.67 


.16 


.80 


.64 


.33 


Short -Terra Memory 














Pe '"".It Correct 


.72 


.34 


.10 


.62 


.38 


.34 


Mean Reaction Time 


.94 


.78 


.06 


.88 


.36 • 


.58 


Slope 


.52 


.47 


.01 


.51 


.17 


.35 


Intercept 


.84 


.74 


.11 


.n 


.34 


.50 


Kurrber Memory 














Percent Correct 


.63 


.53 


.40 


.23 


.18 


.45 


Mean Operation Tliae 


.95 


.88 


.33 


.62 


.12 


.83 


Cannon Shoot 














Time Error 


.88 


.66 


.02 


.86 


.12 


.76 


Target Track 1 














Mean Log Distance 


.97 


.68 


.23 


.74 


.69 


.28 


Target Shoot 














Mean Time to Fire 


.91 


.48 


.06 


.85 


.10 


.81 


Mean log DIatance 


.86 


.58 


.11 


.75 


.33 


.53 


Target Track 2 














Mean Log DIatance 


.97 


.77 


.17 


.80 


.67 


.30 



* In coaputing the R with other coinputer testa, each test score was predicted 
using only the test scores from the rem&Snlng nine coniputer tests. Thus, for 
example, STM* Intercept was not used as a predictor In estimating STM-Mean RT. 

^ The R^ with the ASVAB and with the other computer-adiainlstered tests were cor- 
rected for shrinkage that would be expected with cross-validation, N « 182 for 
R computations. 

e 2 
Uniqueness estimates (U ) were computed using the split-half reliability 

estimate. The uniqueness Is equal to the reliability minus the R with the ASVAB 
or with the other coinputer tests. It is a measure of the unique^ reliable vari- 
ance that each test score might contribute to the prediction of job perfo rm a nc e 
criteria. 
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Table 6.3 

Correlations Between C omputer Test Scores and Prevlou!; Experience 
Video Games (W - gSQ)'* 



Computer Test Test Score Correlation^ 



Simple Reaction Time 


Mean RT 


.12* 


Choice Reaction Time 


Mean RT 


.15* 


Perceptual Speed & Accuracy 


Percent Correct 
Mean RT 

^1 nnp 

Intercept 


-.01 
.01 

-.03 
.06 


Target Identification 


Per:ent Correct 


.08 
.05 


Short-Term Memory 


Perc8.it Correct 

Mp;)r RT 

Slope 
Intercept 


.13* 

no 

-.16* 
.18* 


Number Memory 


Percent Correct 
Mean RT 


.08 
.00 


Cannon Shoot 


Time Error 


.18* 


Target Tracking 1 


Mean Log Distance 


.22* 


Target Shoot 


Mean Time to Fire 
Mean Log Distance 


.10 
.27* 


Target Tracking 2 


Mean Log Distance 


.16* 



* Varies slightly by test. 

^ Correlations of .12 or greater are statistically signifi- 
cant at the .05 level, two-tailed test of significance. 
Signs of correlations have been reflected, where appro- 
priate, so that greater video experience shows positive 
correlation with better test performance. 
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tested at each station. (We used only males in this analysis to avoid 
confounding the results with gender differences, since the 47 females 
tested were not evenly balanced across the six testing stations. Also, 
only sales with complete sets of computer tert scores were used so the 
analyses would have the same sample fcr each test score.) 

We ran a one-way multivariate analysis of variance (MANOVA) for the 19 
computer test scores, witfi six "machine" levels. As Table 6.4 shows, 
Machine differences had no effect on test scores. The MANOVA likelihood 
ratio was .99 (p value « .50). Table 6.4 also shows the univariate F ratio 
and p values for each of the 19 scores. None of them reached statistical 
significance at the .05 level, again indicating that the testing station 
had no significant effect on these 19 scores. 

These results were especially ^encouraging because they replicated a 
similar set of results from the earlier Fort Lewis pilot test (see Chap- 
ter 5). The results showed that the hardware and software used in the 
computer-administered battery had, indeed, resulted in a standardized 
testing situation across the six machines and testing stations. We think 
this is due in large part to the calibration software used to make the 
hardware equivalent across stations, as described in Chapter 1. 
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Effects of Machine Differences on Comouter Test Scores^: 




Fort Knox Field Test 










Computer Test Score 




F 




Simple Reaction Time 
Mean Reaction Time 




1.59 


.16 


Choice Reaction Time 
Mean Reaction Time 




.52 


.76 


Perceptual Speed and Accuracy 
Percent Correct 
Mean Reaction Time 
Slope 
Intercept 




1.18 
.56 
.84 
.85 


.32 
.73 
.53 
.52 


Target Identification 
Percent Correct 
Mean Reaction Time 




1.67 
.93 


.14 
.46 


Short-Term Memory 
Percent Correct 
Mean Reaction Time 
Slope 
.Intercept 




.11 
.95 
1.13 
.64 


.99 
.45 
.34 
.67 


Number Memory 
Percent Correct 
Mean Operation Time 




.56 
1.55 


.73 
.17 


Cannon Shoot 

TiiRP Frrnr 






. Uo 


Target Track 1 

Mean Log Distance 




.62 


.69 


Target Shoot 

Mean Time to Fire 
Mean Log Distance 




1.91 
1.01 


.09 
.41 


Target Track 2 

Mean Log Distance 




.86 


.51 


* MANOVA likelihood ratio - 


.99, p » . 


50 for these test 


scores. 


^ Degrees of freedom (df) - 


5,200 for 


all 19 test scores. 
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EFFECTS OF PRACTICE ON SELECTED COMPUTER-ADMINISTERED TEST SCORES 



During the Fort Knox field test, data were collected to investigate 
the effects of practice on computer test scores. The experimental design 
for this work is shown in Figure 6.2. In accordance with this design, a 
statistically significant Time x Group interaction would indicate that a 
practice effect had occurred. 

Figure 6.3 shows the make-up of the test items in the computer prac- 
tice battery and the order in which they were administered. Practice was 
given on five tests: Reaction Time 2 (Choice Reaction Time), Target 
Tracking 1, Cannon Shoot, Target Tracking 2, and Target Shoot. These tests 
were selected because they were thought to be the tests that would show 
greatest improvement with practice. All the psychomotor tests were in- 
cluded. The soldiers in the practice group received two practice sessions 
on each of the five tests and then completed the five tests as they had 
been administered to them the first time they completed the battery. Note 
that unique items (i.e., items not appearing on the full battery test) were 
used for Target Tracking 1, Target Tracking 2, and Cannon Shoot. 

Table 6.5 shows the results of the ANOVAs for the five tests included 
in the practice effects research. (We initially used separate ANOVAs 
rather than a MANOVA, knowing that it could spuriously show significant 
effects where a MANOVA would not. However, when only one practice effect 
reached statistical significance, it seemed unnecessary to run the more 
conservative MANOVA.) These results show only one statistically signifi- 
cant practice effect, the Mean Log Distance score on Target Tracking 2. 
Three findings for Time were statistically significant, indicating that 
scores did change with a second testing, whether or not practice trials 
intervened between the two tests. Finally, note that the omega-squared 
values show that relatively small amounts of test score variance are ac- 
counted for by the Group, Time or Time x Group factors, also demonstrating 
the insignificance of practice effects. 

Table 6.6 shows further analyses of the practice experimental data. 
Gain scores and test-retest reliability coefficients were computed for the 
retest and practice groups, and tests for significant differences between 
the two groups were performed. Note that the difference between the gain 
scores for the retest and practice groups reached statistical significance 
only for the distance score for Target Tracking 2, reflecting the same 
finding in Table 6.5. 

These data suggest that the practice intervention was not a particu- 
larly strong one. It should be noted, though, that on some tests subjects' 
performance actually deteriorated from Time 1 to Time 2. The average gain 
score for the two groups across the five dependent measures was only .09 
standard deviations. This suggests either that the tasks used in these 
tests are resistant to practice effects, or that performance on these tasks 
reaches a maximum level of proficiency after only a few trials. Also, 
recall that analyses of the PTB cognitive paper-and-pencil tests (see Table 
4.3) showed gain scores that were as high as or higher than those found 
here. Perhaps gain in scores through retesting or practice is of even less 
concern for computerized tests than for oaper-and-pencil tests. 
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Group 1 
Group 2 



Tj > Practice yfy Practice Group 

Two Weeks (Five Tests) (N - 74) 



ANOVA 



Source 
Group 

Subjects (Group) 
Time 

Time x Group 

Time x Subject (Group) 



A-1 

(B-l)A 

C-1 
(C-1)(A-1) 
(C-1)(B-1)A 



Practice Effect - Significant Time x Group Interaction 



Figure 6.2 Experimental design of the practice effects investigation. 



Test 



No. of Items 



Demographics 




5 


Reaction Time 2 




15 


Target Tracking 


2 


15 


Cannon Shoot 




24 


Target Tracking 


2 


15 


Target Shoot 




20 


Reaction Time 2 




15 


Target Tracking 


1 


15 


Cannon Shoot 




24 


Target Tracking 


2 


15 


Target Shoot 




20 


Reaction Time 2 




15 


Target Tracking 


1 


27 


Cannon Shoot 




48 


Target Tracking 


2 


27 


Target Shoot 




40 



Comments 



Same as in the 
Same as 1n the 
Unique items 
Unique items 
Unique items 
Odd -numbered i 
Same as in the 
Unique items 
Unique items 
Unique items 
Odd-numbered i 
Same as in the 
Same as in the 
Same as in the 
Same as in the 
Same as in the 



Test Battery 
Test Battery 



tems from the Test Battery 
Test Battery 



tems from the Test Battery 
Test Battery 
Test Battery 
Test Battery 
Test Battery 
Test Battery 



Figure 6.3 Items in the Computer Practice Battery used at the 
Fort Knox Field Test. 
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Table 6.5 



Effects of Practice on Selected Computer Test Scores 





Dependent 


Source of 






Omega 


Test 


Measure 


Vsrl^nc? 




£ 


Sauared 


Choice Reaction Time 


Trimmed Mean 


Group 


1,180 


9.71* 


.032 




Reaction Time 


Time 


1,180 


25.70* 


.035 






Time x Group 


1,180 


.73 


-- 


Target Tracking 1 


Mean Log Distance 


Group 


1,178 


.73 


_ _ 




Time 


1,178 


9.26* 


.005 






Time x Group 


1,178 


4.11 


— 


Target Tracking 2 


Mean Log Distance 


Group 


1,178 


.47 








Time 


1,178 


1.30 








Time x Group 


1,178 


7.79* 


.005 


Cannon Shoot 


Time Error 


Group 


1,171 


3.79 








Time 


1,171 


.16 








Time x Group 


1,171 


5.72 




Target Shoot 


Mean Log Distance 


Group 


1,171 


.41 








Time 


1,171 


9.28* 


.012 






Time x Group 


1,171 


.08 





"^Denotes significance at p<.01. 



Next, Table 6.6 shows that the test-retest stability for all five 
dependent measures was greater for the retest group than for the practice 
group. (While the difference between the stability coefficients for the 
two groups was statistically significant for only one of the dependent 
measures, the test was not very powerful; statistical significance required 
a difference of approximately .40 between the two stabilities.) Closer 
inspection of the data shows that the stability coefficients for the two 
groups were very nearly equal for the three "distance" dependent measures. 
Thus, it appears that the rank-ordering of subjects' performance on psycho- 
motor tests is not greatly affected by practice. 

Another method for examining practice effects is to look at the corre- 
lations between items or parts within a test. This was done for Target 
Tracking Tests 1 and 2. Each test was divided into three parts corre- 
sponding to test items 1-9, 10-18, and 19-27. A distance score was then 
computed for each of the three parts. Table 6.7 shows the intercorrela- 
tions among the three part scores for both sts for both Time 1 and 
Time 2. (Time 2 data were taken from the retest group only; the practice 
group's data were not included.) 

If the ability requirements of the tracking task were changing due to 
practice during the course of the test, one would expect to find that the 
correlation between items 1-9 and items 19-27 would be lower than either of 
the two correlations involving items 10-18. This did not occur. While 
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Table 6.6 

Gain Scores and Re11ab1 Ht1es for Retest and Practice Groups ^ 



I 



Test 


DeDendent 
Measure 


Group 


Gain 
Score- 


Z for 
Gain 
Scores 


Reliability 


Choice Reaction Time 


Trlnned Mean 


x<etest 


-.36 


.73 


.56 




Reaction Time 


Practice 


-.43 




.36 


Target Tracking 1 


Mean Log Distance 


Retest 


.07 


4.11 


.68 






Practice 


.33 




.64 


Target Tracking 2 


Mean Log Distance 


Retest 


-.09 


7.79* 


.77 






Practice 


.21 




.76 


Cannon Shoot 


Time Erroi; 


Retest 


.34 


5.72 


.66 






Practice 


-.11 




.51 


Target Shoot 


Mean Log Distance 


Retest 


.21 


.08 


.58 






Practice 


.26 




.48 



Z. for 



1.64 



.46 



.16 



1.50 



.88 



256 



^ Inferential statistics significant' at p < .01 are denoted with an asterisk(*), 

^ Gain scores are effect size estimates and were computed using the pooled 
standard deviation. Signs were reflected as necessary so that a positive 
gain score denotes "improvement" from Time 1 to Time 2. 

^ Given the sizes of the retest and practice samples, statistical significance 
(at p < .01) will not be attained until the difference between the two 
reliabilities reaches approximately .40. 
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Table 6.7 

Intercorrel ations Among Items 1-9. Items 10-18. and Items 19-27 of Target 
Tracking Te sts 1 and 2 



Target Tracking Test 1 



Time 1 Time 2 

Items Items Items Items Items Items 

1-9 10-18 19-27 1-9 10-18 19-27 

Items Items 

1-9 1-9 

Items Items 

10-18 .87 10-18 .91 

Items Items 

19-27 .80 .87 19-27 .92 .92 



Target Tra cking Test 2 



Items 
1-9 



Time 1 



Items 



Items 



Items 

10-18 .83 
Items 

19-27 .85 



.89 



Items 
11:21 



Items 
1-9 

Items 
10-18 

Items 
19-27 



Time 2 



-ms 



Items Items 
IMS 11:21 



.86 



.85 



.91 
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there Is a slight tendency for the correlation between Items 10-18 and 
Items 19-27 to be the highest of the three intercorrelations, the differ- 
ence between the highest and lowest correlation within each test averages 
only .05. Data In Table 6.1 show that the Spearman -Brown corrected spile- 
half rellabllUy of both tests Is .97, suggesting that all of the Items 
within each test are measuring the same underlying ability. 

In summary, data from the practice experiment Indicate that scores 
from computerized psychomotor tests appear to be quite stable over a two- 
week period. Practice does have some effect on test scores, but It appears 
to be relatively small. Certainly It does not seem strong enough to war- 
rant serious concern about the usefulness of the tests. 
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COVARIANCE ANALYSES WITH ASVAB SUBTESTS AND COGNITIVE PAPER-AND- PENCIL TESTS 

Table 6.8 contains the intercorrelations for the ASVAB subtests, 
paper-and-pencil cognitive measures, and the computer-administered tests, 
which include both perceptual and psychomotor measures. Scores on the AFQT 
are also included. These correlations are based on the Fort Knox field 
test sample but inclisde only those subjects with test scores available on 
all variables (N - 168). 

In examining these relationships, we first looked at the correlations 
between tests within the same battery. As was discussed in Chapter 4, 
correlations between ASVAB subtest scores range from .02 +o .74 (absolute 
values), and correlations between the cognitive paper-and-pencil test 
scores range from .27 to .67. For the perceptual computer-administered 
test scores, correlations range from .00 to .83 (absolute terms). Note 
that the highest values appear for correlations between scores computed 
from the same test; for example, the correlation between Short-Term Memory 
reaction time and intercept is .83, and the correlation between Perceptual 
Speed and Accuracy slope and react * n time is .82. Correlations between 
the psychomotor computer-administered variables range from .15 to .81 (ab- 
solute terms). Note that scores on the two tracking tests correlate the 
highest. 

Perhaps the most important question to consider is the overlap be* '"en 
the different groups of measures. Do the paper-and-penc 1 measures - 
computer-administer ■< tests correlate highly with the ASVAB and wict 
other or are they ic uring unique or different abilities? To address this 
question, in part, examined the intercorrelations between the ASVAB, 
including AFQT, and other groups of tests. 

As noted in Chapter 4, for the cognitive paper-and-pencil tests these 
correlations range from .01 (Assembling Objects and Number Operations) 
to .63 (Orientation 3 and Mechanical Comprehension), with a mean correla- 
tion of .33 (SF9 Table 6.9 for a summary of the correlation statistics). 
Across all PTB paper-and-pencil tests, ASVAB Mechanical Comprehension ap- 
pears to correlate the highest with the new tests; across all ASVAB sub- 
tests, PTB Orientation 3 yields thu .lighest correlations. 

The correlations between the ASVAB subtests and the computer-adminis- 
ters-^ perceptual tests, in absolute terms, range from .00 (Paragraph Com- 
prehension with Perceptual Speed and Accuracy Reaction Time and with Short- 
Terra MF'nory Intercept, and General Science with Perceptual Speed and Ac- 
curacy alope) CO .58 (Arithmetic Reasoning anu Number Memory Percent Cor- 
rect). The mean of these 165 corre"'ations is .1" (SD - .12). Across all 
ASVAB subtests, scores on the Short-Term Memory Reaction Time and Slope 
yield the lowest correlations. The highest values appear for Number Memory 
Percent Correct and Fraction Time. 

The correlations between ASVAB subtests and psychomotor scores range 
'rom .00 (Coding Speed with Target Shoot Time and T* get Shoot Distance) to 

.44 (Mechanical Comprehension and Tracking 1). ..le mean of these 44 
correlations (absolute values) is .i7 (SD - .12). Note that for the most 
part, these four PTB variables yield the hignest correlations with ASVAB 
Mechanical Comprehension and Electronics Information. The lowest correla- 
tions appear for Paragraph Comprehension, Number Operations, and Coding 
Speed. 
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Table 6.8 



Intercorrelations Among the ASVAB Subtests and the Pilot Trial Battery 
Cognitive Paper-and-Pencil and Perceptual /Psychomotor Computer-Administered 
Tests: Fort Knox Sample 
(N = 168) 
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IT 
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The intercorrelations between the PTB cognitive paper-and-pencil tests 
and the computerized tests in general range from ♦OO to ♦46 {in absolute 
terms). The mean of the 40 psychomotor/cognitive paper-and-pencil test 
score correlations is .24 (SD - .11). The mean of the 150 perceptual 
computer score/cognitive paper-and-pencil test score correlations is .19 
(SD • .1). Tne computerized test variables that correlate consistently 
highly with the paper-and-pencil tests include Target Identification Reac- 
tion Time, Number Memory Percent Correct and Reaction Time, Tracking 1, and 
Tracking 2. 

Intercorrelations between the cognitive/perceptual computer tests and 
the psychomotor computer tests range from .00 to .42 (mean » .15 and SD 
« .11). The highest values appear for the correlations between the four 
psychomotor measures and Target Identification Percent Correct and Short- 
Term Memory Slope. 

Table 6.9 summarizes the correlational data in Table 6.8 that we 
discussed just above. The values in the two tables and the discussion lead 
to the conclusion that the various types of measures do not overlap exces- 
sively, and, therefore, do appear to each make separate contributions to 
ability measurement. 
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PerceptuaK and Psychomotor Ahil it ip<; ^ 



I 

to 

00 



Types of scores Correlated 

ASVAB Subtests and PTB Cognitive 110 
Paper-and-Pencil Tests 

ASVAB Subtests and PTB Cognitive/ 165 
Perceptual Computer-Administered Tests 

4 

ASVAB Subtests and PTB PsychoBotor 44 
Computer-Administered Tests 

PTB Cognitive Paper-and-Pencil Tests and 150 
PTB Perceptual Computer-Administered Tests 

PTB Cognitive Paper-and-Pencil Tests and 40 
PTB Psychomotor Computer-Administered Tests 

PTB perceptual Computer-Administered Tests and 60 
PTB Psychomotor Computer-Administered Tests 



Number of Mean* SD* of Minimum* 

Correlations Correlations correlation Correlation 



.33 



.15 



.17 



.19 



.24 



.15 



.14 



.12 



.12 



.11 



.11 



.11 



.01 



.00 



.00 



.00 



.01 



.00 



These statistics are based on absolute correlation values. 
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FACTOR ANALYSIS OF PTB COGNITIVE PAPER-AND-PENCIL MEASURES, 
PTB PERCEPTUAL- PSYCHOMOTOR COMPUTER-ADMINISTERED TESTS, 
AND ASVAB SUBTESTS 

In addition to examining intercorrelatlons, we also extfinlned results 
from a factor analysis of scores of the ASVAB, cognitive paper-and-pencil 
measures, and computer-administered tests. Two variables. Perceptual Speed 
and Accuracy Reaction Time and Short-Term Memory Reaction Time, were 
omitted from this analysis because these scores correlated very highly with 
their corresponding Slope or Intercept variables; to avoid obtaining com- 
munal 1 ties greater than one, these two reaction time measures were omitted. 

Results from the seven-factor solution of a principal components factor 
analysis with varimax rotation are displayed in Table 6.10. All loadings 
of .30 or greater are shown. Our interpretation of these data, by factor, 
is as follows. 

0 Factor 1 Includes eight of the ASVAB subtests (General Science, 
Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, 
Automotive Shop, Mathematical Knowledge, Mechanical Comprehension, 
and Electionics Information), six of the cognitive paper-and-pencil 
measures (Assembling Objects, Reasoning 1 and 2, and Orientation 1, 
2, and 3) and two perceptual computer variables (Number Memory 
Percent Correct and Reaction Time), Because this factor contains 
measures of verbal, numerical, and reasoning ability we have termed 
this "g", or a general ability factor. 

0 Factor 2 Includes all of the PTB cognitive paper-and-pencil mea- 
sures. Mechanical Comprehension from the ASVAB, and Target Identi- 
fication Reaction Time from the computer tests. We called this a 
general spatial factor. 

.0 Factor 3 has major loadings on the three psychomotrr tests 
(Tracking 1, Tracking 2, and Target Shoot Distance), with sub- 
stantially smaller loadings from three cognitive/perceptual com- 
puter test variables (Target Identification Reaction Time, Short- 
Term Memory Intercept, and Cannon Shoot Time Error), the Path Test, 
and Mechanical Comprehension from the ASVAB. Given the high 
loadings of the psychomotor tests on this factor, we refer to this 
as the motor factor. 

0 Factor 4 includes variables from the cognitive/perceptual computer 
tests. These include PS&A Percent Correct, Slope, and Intercept; 
Target Identification Percent Correct, and Short-Term Memory Per- 
cent Correct. This factor appears to involve accuracy of percep- 
tion across several tasks and types of stimuli. 

0 Factor 5 contains variables from the perceptual computer tests, 
including Simple Reaction Time RT, Choice Reaction Time RT, Short- 
Term Memory Intercept, PS&A Intercept and Percent Correct, and 
Target ID RT. Also loading on this factor is a cognitive paper-and- 
pencil test. Orientation 2. This factor is not very clear, but the 
highest loadings are on straightforward reaction time measures, so 
we interpret this as a speed of reaction factor. 
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Table 6.10 

Principal Components Factor Analysis of Scores of the ASVAB Subtests. 
Cognitive Paoer-and-Pencil Measures, and Cognitive/Perceptual and Psychomotor 
Computer-Administered Tests ^ 
(N « 188) 



Variable 

ASVAB 

6S 
AR 
WK 
PC 
NO 
OS 
AS 
HK 
HC 
CI 

COGNIIIVE PAPER- 
AND-PENCIl 



F^cUr 1. FKt9r 2 FiCtor 3 Factor 4 Factor 5 Factor 6 Factor 7 



75 
75 
77 
62 



62 
77 
63 
72 



84 

62 



38 



-30 



59 
73 
62 
47 
77 
44 
58 
70 
68 
65 



Assemb ObJ 


35 


69 


Obj Rotation 




-61 


Shapes 




66 


Haze 




70 


Path 




67 


Reason 1 


37 


58 


Reason 2 


37 


47 


Orient 1 


37 


64 


Orient 2 


40 


46 


Orient 3 


60 


52 



PERCEPTUAL 
COMPUTER 

SRT-RT 
CRT-RT 
PSiA-PC 
PS&A Slope 
PS&A Inter 
Target ID-PC 
Target ID-RT 
STH-PC 
STM-Slope 
STM-Int 

Cannon Shoot-TE 
Ho Kern- PC 
Ho Ken-RT 



•30 



-41 



53 
-37 



37 



38 
32 



67 
88 
-6S 
40 

39 



-30 



63 
61 
31 

50 

30 

51 



34 
41 



37 
-46 



66 
49 
51 
67 
65 
54 
44 
58 
52 
67 



44 

50 
70 
81 
74 
25 
57 
41 
25 
47 
19 
52 
54 



PSYCHOMOTOR 
COMPUTER 

Tracking 1 
Tracking 2 
Target Shout-TF 
Target Shoot-Dlst 

Variance 
Explained 5.69 



4.70 



86 
77 

64 



2.83 



2.37 



1.92 



1.87 



42 



1.17 



NOTE: Decimais have been ooltted from factor loadings. 

* Note ti.at the following variables were not Included in this factor 
analysis: AFQT, PSIA, Reaction Time, and Short-Tern Memory Reaction 
Time. 

• cowwinality (sum of squared factor loadings) for variables. 
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0 Factor 6 contains four variables, two from the ASVAB (Number Opera- 
tions and Coding Speed) and two from the perceptual computer tests 
(Number Memory Percent Correct and Reaction Time). This factor 
appears to represent both speed of reaction and arithmetic ability. 

0 Factor 7 contains three variables from the computer-admini.tered 
tests: Short-Term Memory Percent Correct and Slope, and Target 
Shoot Time to Fire. This factor is difficult to interpret, but we 
believe it may represent a response style factor. That is, this 
factor suggests that those individuals who take a longer time to 
fire on the Target Shoot Test also tend to have higher slopes on 
the Short-Term Memory Test (lower processing speeds with increased 
bits of information) but are more accurate or obtain higher percent 
correct values on Short-Term Memory. 

Note that several variables- -Target Identification Percent Correct, 
Short-Term Memory Percent Correct, Cannon Shoot Time Error, and Target 
Shoot Time to Fire--have fairly low communal ities. These may be due to 
relatively low score variance or reliability, but it could also be due to 
those variables having unique variance, at least when factor analyzed with 
this set of tests. We think this latter explanation is highly plausible 
for the Cannon Shoot score. 

This concludes the discussion of the pilot testing and the Fort Knox 
field test of the cognitive paper-and-pencil tests and the computer- 
administered tests in the Pilot Trial Battery. We turn now to a discussion 
of the non-cognitive measures in Chapters 7 and 8. 
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CHAPTER 7 



NON-COGNITIVE MEASURES: PILOT TESTING 
Leaetta H. Hough, Bruce N. Barge, and John D. Kamp 



GENERAL 

In this chapter, we describe the development and pilot testing of the 
non-cognitive measures prepared for inc>«sion in the Pilot Trial Battery. 
All are paper-and-pencil ineasures. The inventories developed tap con- 
structs in the temperament, interest, and life history (biodata) domains. 
Field testing of these measures is covered in Chapter 8. 

The non-cognitive measures were pilot tested at Fort Campbell and Port 
Lewis in the spring of 1984. In addition to the newly developed measures, 
four published, marker measures of temperament were utilized in the pilot 
tests. Chapter 2 contains a detailed description of the pilot test proce- 
dures and samples and we do not repeat that discussion here. The pilot 
test results are discussed later in this chapter; we first discuss the 
desired characteristics of these measures. 

Desired Characteristics 

As described in Chapter 1, the Task 2 research team extensively re- 
viewed the literature and the existing tests and constructs available in 
the non-cognitive area as well as in the cognitive and psychomotor areas. 
The literature review served to identify non-cognitive constructs most 
relevant and important for the prediction of success in a variety of Armv 
MOS (Hough, Kamp, & Barge, 1985). 

In the non-cognitive area, there was particular interest in predicting 
adjustment criteria, such as attrition, job satisfaction, and unfavorable 
discharge/disciplinary action, as well as job and training performance. 
Attention to adjustment criteria was important in the development of non- 
cognitive predictors because these criteria are typically not highly re- 
lated to scores on cognitive or perceptual/psychomotor tests. Non-cog- 
nitive measures were also seen as valuable for use in classification. The 
expert judgment research (see Chapter 1) indicated the importance of in- 
cluding measures of several non-cognitive constructs. Following these 
explorations, the IPR meeting in March 1984 resulted in the identification 
of a set of non-cognitive constructs to be developed for the Pilot Trial 
Battery. (See Figure 1.5.) 

Development of the non-cognitive measures was guided by several impor- 
tant, yet sometimes conflicting, goals. First, it was desired that the 
scales have construct validity. Item co.itent of each scale should a heter- 
ogeneous enough to cover all important aspects of the targeted construct, 
yet homogeneous enough to be interpretable and distinct from other con- 
structs. In addition, the scales should be a valid assessment of the 
respondent's standing on the construct, rather than merely a reflection of 
social desirability. 

Other important considerations during the development of the inven- 
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lories included reliability and stability. The scales were to be both 
internally consistent and stable over time (test-retest). The measures 
should also be stable over situations, so that faking or differing response 
sets would not greatly distort the scores obtained. Items and scales 
should elicit sufficient variance in responses that the scores could be 
used to differentiate respondents. It was important that the item content 
be non-objectionable. Finally, it was extremely important that the mea- 
sures be able to demonstrate validity in predicting the respondent's 
standing on various job performance and other important criteria. 

ABLE and AVOICE 

The above set of desired characteristics formed the basis for the 
development of the scales to be described in this chapter. Our discussion 
of these scales is divided into two areas that correspond to the two 
inventories that were employed. The ABLE (Assessment of Background and 
Life Experiences) contains items that assess the important constructs of 
the temperament and life history (biodata) domains. The items on the ABLE 
are all new items written by PDRI researchers. Each item was written to 
tap one of the constructs identified via the literature review and other 
earlier phases of the project (see above and Chapter 1). Many candidate 
items were written. These were reviewed by the entire non-cognitive team 
and the best appearing items were selected for initial inclusion on the 
ABLE. The main criteria for item selection were: the item was clearly 
relevant for measuring a targeted construct; it was clearly written, and 
content was non-objectionable. The AVOICE (Army Vocational Interest Career 
Examination) measures the relevant constructs of the interest domain. The 
AVOICE is a significantly modified version of the VOICE (Vocational In- 
terest Career Examination) which had been developed and researched by the 
U.S. Air Force (Alley & Matthews, 1932). In general, items were modified 
to measure interests that seemed more appropriate to Army occupations. 
Items were also written to tap interests that were not included on the 
VOICE. We describe the constructs, scales, and pilot test results of the 
ABLE first, and then do the same for the AVOICE. 

The constructs chosen for the battery are described with examples of 
the item content for each construct scale; any revisions made on the basis 
of the pilot tests are discussed. Data obtained during the pilot testing 
are reported, including means, standard deviations, reliabilities, scale 
intercorrelations, factor analyses re';ults, gender and race differences, 
and, when available, correlations with marker tests. Finally, the non- 
cognitive measures and the results obtained with them are summarized. 
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TEMPERAMENT/BIODATA CONSTRUCTS 



Before discussing constructs that underlie the development of the 
ABLE, we need to explain how and why the inventory combines the two domains 
of temperament and biodata. Primarily, this action was taken to capitalize 
en the complementary strengths and weaknesses of each domain. The differ- 
ences that exist between them allow each to contribute unique information 
to an assessment, and yet are not so large as to preclude a unified inven* 
tory, as described in Chapter 1. 

Temperament and biodata differ from each other along the sign/sample 
continuum proposed by Wernimont and Campbell (1968). Biodata items are 
best viewed as a sample of past behavior that may predict future behavior 
in a similar situation. Temperament measures are most often a sign, or an 
indicator, of a predisposition to behave in certain ways. Thus, each type 
of information is geared toward predicting future behavior, but each does 
it from a somewhat different perspective along the sign/sample continuum. 

Temperament and biodata may also differ in the emphasis placed on 
conceptual understanding. The study of temperament has, over the years, 
attached importance to the measurem«^nt of constructs and the understanding 
associated with such measurement. Biodata, by contrast, has typically been 
employed in situations requiring maximal criterion-related validity but 
little resulting understanding. 

In short, temperament and biodata both are used to predict an indivi- 
dual's future- behavior, but from different viewpoints and perhaps for 
differing reasons. The distinctions between items fro?n the two uomains are 
not sharp, so merging of the two sets is feasible. Yet their r^jpective 
strengths complement each other when combined in a unified fashion, as in 
the ABLE. 

An this section, we discuss the six temperament/biodata constructs as- 
sessed by the ABLE, the physical condition constructs and the response 
validity scales that were developed. Table 7.1 shows these eight cate- 
gories and the 15 scales that fall under them. 

Strictly speaking, the physical condition construct does not fit into 
the temperament/biodata domain in the same way that the other constructs 
do. It is a highly specific construct that does not have the relatively 
extensive, prior research history that the other constructs have. It was 
included, however, because the construct was seen as important for Anny 
occupations and because we could not measure physical condition directly as 
part of this research project. The ABLE seemed the best instrument for 
collecting the physical condition measure, and so it was included as one of 
the target constructs. 

When used in the initial pilot testing at Fort Campbell, the ABLE 
included a total of 291 items. It was shortened to 268 items for the later 
Fort Lewis pilot test. (See Chapter 2 for detailed information on the 
procedures and samples for these pilot tests.) Most of these items have 
three response options that reflect a continuum of the construct in ques- 
tion. The response o.^don that reflects the highest level of the construct 
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Table 7.1 



Tewoerawent/Biodata Scales fbv Construct) Developed for Pilot Trial Battery : 
ABLE - Asses sment pf Backgrou nd and Life Experiences 



Construct 



Scale 



Adjustment 
Dependability 

Achievement 

Physical Condition 
Leadership (Potency) 

• 

Locus of Control 
Agreeableness/Likeabil ity 
Response Validity Scales 



Emotional Stability 

Nondelinquency 
Traditional Values 
Conscientiousness 

Work Orientation 
Self -Esteem 

Physical Condition 

Dominance 
Energy Level 

Internal Control 

Cooperativeness 

Non-Random Response 

Unlikely Virtues (Social Desirability) 

Poor Impression 

Self -Knowledge 



(e.g., most dominant) is scored as a 3, while the middle response option is 
scored as a 2 and the lowest level response is scored as a 1. The direc- 
tion of scoring differs from item to item, so the first response option is 
sometimes high on the construct (i.e., scored as a 3) and sometimes low 
(scored as a 1), to prevent response bias. 

We now discuss each construct in turn and the scales developed to tap 
that construct. The description of the number of items on each scale 
refers to the Fort Campbell version. 

Adjustment 

Adjustment is defined as the amount of emotional stability a':d stress 
tolerance that one possesses. The well-adjusted person is generally calm, 
displays an even mood, and is not overly distraught by stressful situa- 
tions. He or she thinks clearly and maintains composure and rationality in 
situations of actual or perceived stress. The poorly adjusted person is 
nervous, moody, and easily irritated, tends to worry a lot, and "goes to 
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pieces" in times of stress. 

The scale included under the Adjustment construct is called Emotional 
Stability. It is a 31-item scale that contains items such ^s: 

§ Have you ever felt sick to your stomach when you thought about 
something you had to do?" 

f Do you handle pressure better than most other people? 

The scale is designed to assess a person's characteristic affect and 
ability to cope effectively with stress. 

Dependability 

The Dependability construct refers to a person^s characteristic degree 
of conscien'^i usness. The dependable person is disciplined, well- 
organiz ; )Rnful, respectful of Uws and regulations, honost, trust- 
worthy, wholesome, and accepting of authority, iuct a person prefers order 
and thinks before acting. The less dependable person is unreliable, acts 
on the spur of the m^^inent, and is rebellious and contemptuous of laws and 
regulations. Three ABLE scales fall under the Dependability construct: 
inc ^ ling Nondelinquency, Traditional Values, and Conscientiousness. 

Ncndelinquency is a 24-item scale that assesses how often a person has 
violated rules, laws, or social norms. It includes items such as: 

• how often have you gotten into fights? 

• Before joining the Army, how hard did you think learning to take 
orders would be? 

• How many times were you suspended or expelled from high school? 

Traditional Values, a 19- item scale under the Dependability construct, 
contains items such as the following: 

« Are you m^e strict about right and wrong than most people your 
age? 

• People should have greater respect for authority. Do you agree? 

These 'tems assess how conventional or strict a person's value syst^*m 
is, and how mucn flexibility he/she has in this value system. 

Conscientiousness, the third scale falling under the Dependability 
construct, contains 24 items. This scale assesses the respondent's degree 
of dependability, as well as the tendency to be organized and planful. 
Items include: 

• How often do you keep the promises you make? 

• How often do you aci. on the spur of the moment? 
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Achievement 



The Achievement construct Is defined as the tendency to strive for 
competence in one's work. The achievement/work-oriented person works hard, 
sets high standards, tries to do a good job, endorses the work ethic, and 
concentrates on and persists in completion of the task at hand. This 
person is also confident, feels success from pasw undertakings, and expects 
to succeed in the future. The less achievement-oriented person has little 
ego involvement in his or her wo-^k, feels incapable and self -doubting, does 
not expend much effort, and does not feel that hard work is desirable. 
Two scales fall under the Achievement construct. 

The 31-item scale entitled Work Orientation addresses how long, hard, 
and well the respondent typically works and also how he/she feels about 
work. Among the scale items are these: 

• How often do you give up on a difficult problem? 

• How hard were you willing to work for good grades In high school? 

• How important is your work to you? 

The other scale pertaining to Achievement is called Self-Esteem, a 16- 
item scale that measures how much a person believes in himself/herself and 
how successful he/she expects to be in life. Items from this scale in- 
clude: 

• Do you believe you have a lot to offer the Army? 

• Has your life so far been pretty much a failure? 
Physical Condition 

The optimal way to establish physical condition is, of course, to 
administer physical conditioning tests. Since such a program could not be 
a part of the Trial Battery, however, it was decided to ask self -report 
questions through which soldiers could indicate their perceived physical 
fitness levels. As noted earlier, the construct of physical condition was 
included in the ABLE because it was the best tool available to collect such 
self -report data. 

The Physical Condition construct refers to one's frequency and degree 
of participation in sports, excr jise, and physical activity. Individuals 
high on this dimension _otively participate in individual and team sports 
and/or exercise vigorously several times per week. Those low on this 
dimcr^ion have participated only minimally in athletics, exercise Infre- 
quently, and prefer the elevator to the stairs. 

The scale developed to tap this construct Is also called Physical 
Condition, and includes 14 Items. The items assess how vigorously, regu- 
larly, and well the respondent engages In physical activity. Thase Items 
are included on the scale: 

9 Prior to joining the Army, how did your physical activity (work ano 
recreation) compare to most people your age? 

7-6 



• Before joining the Army, how would you have rated vour performanca 
in physical activities? 

Leadership (Potency) 

This construct is defined as the degree of impact, influence, and 
energy that one displays in relation to other people. The person high on 
this characteristic is appropriately forceful and persuasiv., is optimistic 
and vital, and has the energy to get things done. The person low on this 
characteristic is timid about offering opinions or providing direction and 
is likely to be lethargic and pessimistic. 

Two ABLE scales are associated with the Leadersmp construct: Domi- 
nance and Energy Level. Dominance is a 17-item scale that includes such 
items as: 

• How confident are you when you tell others what to do? 

• How often do people turn to you when decisions have to be made? 

The scale assesses the respondent's tendency to take charge or to 
assume a central and public role. 

The other Leadership scale, entitled Energy Level, is designed to 
measure to what degree one is energetic, alert, and enthusiastic. This 
scale includes 27 items, such as these: 

• Do you get tired pretty easily? 

• At what speed do you like to work? 

• Do you enjoy just about everything you do? 
Locus of Control 

The Locus of Control construct refers to one's characteristic belief 
in the amount of control he/she has or people have over rewards and pun- 
ishments, ."he person with an internal locus of control expects that there 
are consequences associated with behavior and that people control what 
happens to them by what they do. The person with an external locus of 
control believes that what happens to people is beyond thbir personal 
control . 

The Internal Control scale is the only ABLE scale that taps the Locus 
of Control construct. It is a 21-item scale that assesses both internal 
and external control, primarily as they pertain to reaching success on the 
job and in life. The following are example items: 

c Getting a raise or a promotion is usually a matter of luck. Do you 
agree? 

• Do you believe you can get most of the thinijs you want if you work 
hard enough for them? 
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AgreeabT eness/H keabi 1 1 tv 

The Agr5eableness/Li keabi lity conrtruct is defined as the degree of plea- 
santness versus unpleasantness a persor. exhibits in interpersonal rela- 
tions. The agreeable and likeable person is pleasant, tolerant, tactful, 
helpful, not defensive, and generally easy to get along with. His/her 
participation in a group adds cobssiveness rather than friction. The 
relatively disagreeable and unlikeable person is critical, fault-finding, 
touchy, defensive, alienated, and generally contrary. 

The Cooperativeness seals is the only measure of this construct in the 
ABLE, and is composed of 28 items. These items assess how easy it is to 
get along with the person making the responses. Items from this scale 
include: 

• Kow often do you lose your temper? 

• Would most people describe you as pleasant? 

• How well do you accept criticism? 

Response ^'aliditv Scales 

The purpose of the validity scales is to provide additional informa- 
tion about the way in which respondents have completed the ABIE. The 
primary purpose of these scales is to determine the validity of the re- 
spori::<:s, that- is, the degree to which ilie responses are accurate depictions 
of the person completing the inventory. Those who are responding in an 
inaccurate way can be identified, and appropriate action taken. (For 
example, scores on content scales could be adjusted or the subject could be 
required to retake the inventory.) For those who appear to be responding 
a::curately, the responses can be analyzed with greater confidence. 

Four validity scales are ir.cluded on the ABLE: Non*Random Response, 
Unlikely Virtues {Social Desirability), Poor Impression, and Self- 
Knowledqe. These validity scales are modeled on similar kinds oT scales 
that are routinely used in many measures of temperament, for example on the 
Minnesota Multiphasic Psychological Invantory (Duhlst*-om, Welsh, i 
Dahlstrom, 1975) and the California Psychological Inventory (Gough, 1975). 
Each scale «s discussed below. 

The Non-Random Response scale is very different in content and scoring 
from other scales in the ABLE. The response options fo** an item do mi. 
form a continuum that indicates more or less random responding. Rather, 
there is one right answer which is scored as a 1, while the other two 
response options are both wrong and are both scored zero. Also, the con- 
tent does not ask about oneself; instead, it asks about information that 
any person is virtually certain to know. 

Two of the eight items from the Non-Random Response scale are shown 

next: 
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• The branch of the military that deals most with airplanes is the: 



1. Military Police 

2. Coast Guard 

3. Air Force 

• Groups of soldiers are called: 

1. Tribes 

2. Troops 

3. Weapons 

The intent of this scale is to detect those respondents who cannot or 
are not reading the questions, and are instead randomly filling in the 
circles on the answer sheet. Responses from those with a low score on this 
scale may be eliminated from the analyses since their responses appear to 
be random. 

The second validity scale, entitled Unlikely Virtues, is aimed at 
detecting those who respond in a socially desirable manner (i.e., "fake 
goc-i") rather than an honest manner. There are 12 items on this scale, of 
which these are a sample: 

• Do you sometimes wish you had more money? 

• Have you always helped people without even the slightest bit of 
hesitation? 

Scoring on this scale uses the continuum of response options as de- 
scribed earlier, and those with a high score appear to be responding as 
they think a person slifiuliJ rather than honestly. 

.Poor Impression is the third of the ABLE validity scales, and reflects 
attempts to simulate psychopathology. Persons who attempt to "fake bad" 
receive the most deviant r.cores on scales such as this, while psychiatric 
patients score average or slightly higher than average. Thus, this scale is 
designed to detect those respondents who wish to make themselves appear 
emotionally unstable when in fact thay are not unstable. 

The Poor Impression scale has 23 Items, most of which are also scored 
on another substantive ABLE scale. Items from this scale include the 
following: 

• How much resentment do you feel when you don't get your way? 

• Did your high school classmates consider you easy to gei along with? 

• How often do you keep the promises that you make? 

Scoring on the scale is simi'iar to that of the Non-Random Response 
seal 3, in which only one of the response options is scored as a 1 and the 
other two response options are scored zero. The response option scored 1 is 
the option that indicates the least social deslrabilitj. 
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The final validity scale is the Self -Knowledge scale, which has 13 
items. This scale is intenHed to identify people who are more self-awar , 
more insightful, and more likely to have accurate perceptions about them- 
selves* The responses of persons high on this scale may have more validity 
for predicting job criteria* The following are items froca the Self- 
Knowledge scale: 

• Do other people know yor better than you know yourself? 

t How often do you think about who you are? 

All three of these scales (Unlikely Virtures, Poor Impression, and 
Self -Knowledge) could be used to identify suspect inventories in order to 
either drop the inventory from further analysis or adjust the content 
scales to take account of the scores on these scales. It was part of the 
research task to collect and analyze data to inform the best way to use 
these scales. In particular, the faking/fakability research, reported in 
Chapter 8, was intended to fulfill this purpose. 
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ABLE REVISIONS BASED ON PILOT TESTING 

The non-cognitive inventories were pilot tested at two of the three 
pilot test sites: Fort Campbell and Fort Lewis. Data from these pilot 
tests are presented in the following section. First, however, in the 
following paragraphs we discuss the changes made in the ABLE on the basis 
of the two pilot tests to prepare the ABLE inventory for field testing. 
The changes are discussed for the ABLE as a whole rather than by seals, 
since the changes made were highly similar across scales. 

Revision of the ABLE took place in three steps. The first was edi- 
torial revision prior to pilot testing, the second was based on Fort 
Campbell results, and the th-rd was based ott Fort Lewis findirgs. The 
editorial changes prior to ^>ilot testing were made by PDRI, acting on 
suggestions from both ARI a.id PDRI reviews of the instrument. 

The first editorial review resulted in the deletion of 17 items and 
the revision of 158 items. These actions were made to improve the apparent 
quality of the inventory, and largely consisted of minor changes in 
wording. Many of the changes resulted in more consistency across items in 
format, phrasing, and response options, and made the inventory easier and 
faster to take. 

When the inventory was initially administered at Fort Campbell on 16 
May 1984, the respondents raised very few criticisms or concerns abcit the 
ABLE. Several subjects did note the redundancy of the items on the Phys- 
ical Condition scale, and this 14-item scale was shortened to nine items. 
One additional item characterized as irrelevant was revised. 

Item analyses were based on data from 52 Fort Campbell subjects who 
completed the ABLE. The two statistics that were examined for each ABLE 
item were its correlation with the total scale on which it is scored and 
the endorsement frequencies for all of its response options. 

Items that failed to correlate at least .15 in the appropriate direc- 
tion with their respective scales were considered potentially weak. Items, 
other than validity scale items, for which one or more of the response 
options were endorsed by fewer than two subjects (i.e., < 4% of the sample) 
were also identified. Six items fell in^^o the former category, 63 items 
fell into the latter, and an additional 7 fell into both. All of them were 
examined for revision or deletion, as appropi'iate. 

In summary, a total of 23 items were deleted and 173 items revised on 
the basis of the editorial review and Fort Campbell findings. Items de- 
leted were those that did not "fit well" either conceptually or statis- 
tically, or both, with the other items in the scale and with the construct 
in question. If the item appeared to have a "good fit" but was not clear 
or did not elicit sufficient variance, it was revised rather than deleted. 
The ABLE, which had begun at 29] items, was now a revised 268- item inven- 
tory ready to be administered at Fort Lewis. 

The ABLE inventory was completed by 118 soldiers during the 11-15 June 
pilot testing at Fort Lewis. Item response frequency distributions were 
examined to detect items with relatively little discriminatorv power. 
There were only three items where two of the three response cnoices were 
endorsed by less than 10% of the sample (not including validity scale 



items). Mtter examining the content of these three items, it was decided 

of them intact, and delete one. Twenty items were revised 
because one of the three response choices was endorsed by less than 10 
percent of the sample. 

^-u4e?l!®'^*^^* inventory appeared to be functioning well and only minor 

.iVrllt'^t •^''5 r^^"lr^^ *° ^^^^^ test- 0" the following pages, the 
psychometric data obtained during the two pilot tests are presented 
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PILOT TEST DATA FOR THE ABLE 



Fort Campbell 

We begin the presentation of Fort Campbell pilot test data with the 
results of data screening for the ABLE. The responses of four soldiers were 
eliminated from analyses— two because more than 10 percent of the data was 
missing, and two because their Non-Random Response scale scores suggested pos- 
sible random responding {scores ItJS than 7. out of 8). The total N remaining 
was .52. 

Table 7.2 presents means, standard deviations, mean item-total correla- 
tions, and Hoyt Internal consistency reliabilities for each ABLE scale. The 
Poor Impression scale is not shown in this table because it was not scored for 
this sample. This scale is made up almof-t entirely from items appearing on 
other scales and, as described earlier, nas intended to detect respondents 
trying to simulate psychopathology— usually for purposes of avoiding entry 
into the military. Since these subjects were volunteers currently on active 
duty, the sample size was small, and we had invoked no experimental conditions 
designed to elicit a range of scores on this scale. We, therefore, did not 
score or analyze this scale on this sample. 

The reliabilities of the ABLE scales are excellent. In Table 7.3, the 
scale intercorrelations are shown. It is interesting to note the low correla- 
tions between the Unlikely Virtues scale, which is an indicator of Social 
Desirability, and the other scales. This finding, although based on a small 
sample, suggests that soldiers were not responding only in a socially desir- 
able fashion, but instead were responding honestly. 

The matrix of 10 A3LE scale intercorrelations (Physical Condition and the 
validity scales were not included) was factor analyzed (principal factor anal- 
ysis) and rotated to a simple structure (varimax rotation). The four-factor 
solution that appeared most meaningful is shown in Table 7.4. We labeled the 
four factors Potency, Soc'alization, Dependability, and Likeability. 

The scales loading highest on Factor I, Potency, are Dominance, Energy 
Level, and Self-Esteem; the scales loading highest on Factor II, Socializa- 
tion, are Locus of Control, Traditional Values, and Nondelinquency; the scales 
loading highest on Factor III, Dependability, are Conscientiousness and Work 
Orientation; the scales loading highest on Factor IV, Likeability, are 
Emotional Stability and Cooperati veness. These results are, however, viewed 
as extremely tentative, given the small sample size upon which the factor 
analysis was based. 

In addition to the ABLE, four well-established measures of temperament 
had been administered to 46 Fort Campbell soldiers to serve as marker vari- 
ables: the Socialization scale of the California Psychological Inventory, 
Rotter's Locus of Control scale, and the Stress Reaction scale and Social 
Potency scale of the Differential Personality Questionnaire. The four scales 
(known as the Personal Opinion Inventory, POI) had also been used earlier in 
this project as part of the Preliminary Battery. 

Dat' screening for this joint administration of the ABLE and the POI 
marker variables results in elimination of three inventories (two on the ABLc 
and one on the POI) because more than 10 percent of the data was missing, and 
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Table 7.2 

Fort Campbell Pilot Test: ABLE Scale Statistics 
(N - 52) 



Mean 

No. Item-Total Hoyt 

ligmi Mean SO Correl at ion Reliability 



ABLE Substantive Sraf.? 

ADJUSTMENT 

Emotional Stability 

DEPENDABILITY 

Nondelinquency 
Traditional Values 
Conscientiousness 

ACHIEVEMENT 

Work Orientation 
Self -Esteem 

LEADERSHIP (POTENCY) 

Dominance 
Energy Level 

LOCUS C CONTROL 

Internal Control 
AGREEABLENESS/LIKEABILITV 

Cooperativeness 
PHYSICAL CONDITION 

Physical Condition 

ABLE Response Validity 9ir;^^ 

Non-Randoffl Response 
Unlikely Virtues 
Self -Knowledge 



31 


72.06 


9.10 


.47 


.87 


24 


55.90 


6.28 


.40 


.80 








.07 


.73 


24 


58.04 


5.83 


.41 


.80 


31 


74.46 


8.02 


.42 


.84 


16 


37.35 


5.03 


.54 


.84 


17 


37.67 


5.04 


.53 


7ft 
. / o 


27 


61.29 


7.19 


.46 


.85 


21 


50.98 


6.34 


.46 


.84 


28 


63.81 


6.99 


.39 


.82 


14 
? 


43.08 


9.66 


.66 


.92 


8 










12 


17.98 


3.19 


.38 


.37 


13 


31.42 


3.68 


.43 


.61 
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Table 7.3 

Fort Campbell Pilot Test: ABLE Scale Intercorrelatlons 
(N « 52) 




Emotional Stability 




45 


51 


42 


42 


61 


42 


53 


47 


56 


22 


06 


13 


Nondel inquency 


45 




71 


67 


51 


53 


25 


33 


58 


52 


01 


13 


31 


Traditional Values 


51 


71 




58 


59 


56 


33 


54 


70 


56 


22 


19 


24 


Conscientiousness 


42 


67 


58 




79 


68 


44 


61 


53 


53 


20 


09 


40 


Work Orientation 


42 


51 


59 


79 




72 


52 


77 


59 


47 


18 


10 


39 


Self-Esteem 


61 


53 


56 ■ 


68 


72 




65 


73 


62 


41 


26 


10 


22 


Oominaii^e 


42 


25 


33 


44 


52 


65 




62 


34 


08 


35 


-03 


23 


Energy Level 


53 


33 


54 


61 


77 


73 


62 




55 


38 


27 


15 


21 


Internal Control 


47 


58 


70 


53 


59 


62 


34 


55 




42 


06 


-03 


27 


Cooperativeness 


56 


52 


56 


53 


47 


41 


08 


38 


42 




11 


16 


14 


Physical Condition 


22 


01 


23 


20 


18 


26 


35 


27 


06 


11 




06 


02 


Unlikely virtues 


06 


13 


19 


■ 09 


10 


10 


-03 


15 


-03 


16 


06 




-09 


S el i.'-Knowl edge 


13 


31 


24 


40 


39 


22 


23 


21 


27 


14 


02 


-09 





Decimals have been omitted. 
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Table 7.4 

^^'^^"^^ Rota t ed Princinal Factor Ana1vsP<; nf 

m ABLt sea i 6s 



Factor 













ABLE Scale 








Jl 


Dominance 




.12 


.11 


.00 


Energy Level 




.20 


.42 


■ .24 


Self-Esteeta 


Jfl 


.39 


.33 


.26 


Internal Control 


.35 




.14 


.14 


TradUlonal Values 


.22 




.24 


.29 


Nondellnquency 


.04 




.36 


.22 


Conscientiousness 


.32 


.39 


J.I 


.17 


Work Orientation 


.51 


.32 


Jl 


.13 


Emotional Stability 


.46 


.29 


-.05 




Cooperativeness 


-.07 


.29 


.46 





Snn^JrJ"^*"'*^^' '^^'-^ three on the POI) because of lew 

S oSt riO^rJhe'fSn' ^Ih" 'Jf" °^ °" ^''^^ '^^L^' ™>'^ than 

cir?Gla?1on^hp?wI^n Im ^ H'^ responses of 38 were used to compute 

correlations, between ABLE scales and the markers. 

Results are shown in Table 7.5. It can be seen that a aiven ARl F 
JSnibr S r?^' Jr^^^J" ""o^t. highly with thrSpJfoprfate market 
^le JJf'Am ?*nI^' "•"^'^r construct to be measured. For exam- 

PnLJJ / Sti scale correlates much higher Kith DPQ Social 

Potency (.67) than with the other three marker scales wh^jh are not related 

in La?T^;^lSV%"H*'^^^^^ {-S^ 't^^* While fasS ^esuUs afe based 

SeaLnJr fSf Jln^**'^{ do indicate that the ABLE scales appear to be 
measuring the constructs they were inte-:ded to measure. 
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Table 7.5 



Fort Campbell Pilot Test: Correlations Between ABLE Constructs and Scales 
and Personal Opinion Inventory (POD Marker V a Hables ** 
(N - 38) 



POI Scale 



ABLE Construct 



DPQ Stress 
Reaction 



DPQ Social 
Potency 



Rctttr 
Locus of 
Control 



CPI 

Socialization 



Emotional Stability 



Dominance 



Internal Control 



Nondelinquency 



-.70 



-.24 



.32 



.34 



.32 



.67 



.26 



.10 



.30 



.18 



.67 



.32 



.32 
.22 
.60 



.62 



* "Marker" correlations are indicated by a box. 



Fon Lewis . , 

Soldiers at the Fort L ..is pilot test in June 1984 completed the revised 
version of the ABLE along with the AVOICE, the cognitive tests, and the 
psychomotor tests that comprised the entire Pilot Trial Battery. The final N 
for statistical analyses of the ABLE was 106; 1 inventory wds eliminated 
because msre than 10 percent of the c^ta.was mi-ssing, and 11 were eliminated 
because Non-Random Response was less than 7 (out of 8). 

The means, standard deviations, mean item-total scale correlations, 
ixnd Hoyt reliability estimates appear in Table 7.6 for the entire group 
(.ifter screening). (Again, Poor Impression scale scores were not computed 
for reasons stated earlier.) As can be seen, th? reliabilities of the ABLE 
scales are again excellent. 

Tables 7.7 and 7.8 show the scale means and standard deviations for 
maler and females, and blacks and whites, respectively. Note that the Ns 
are quite small 'or females and blacks, but these statistics do not show 
any striking differences between subgroups. 

In Tablv<» 7.9, the scale intercorrelations are presented for all ABLE 
scales exc2pt the Non-Random Response and Poor Impression validity scales. 
It can be seen that iri the Fort Lewis data. Unlikely Virtues (Social Desir- 
ability) correlates more highly with other scales than in the fort Campbell 
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Table 7.6 

Fort Lewis Pilot Test: A RLE Scale Statistics for Total Rrnnp 



Mean 

No. Item-Total Hoyt 

lism JL JlsaiL so correlation Reliability 



ABLE Substantive Sca1«»s 

AOJUSTHENT 

Emotional Stability 

DEPENDABILITY 

Ncndelinquency 
Traditional Values 
Conscientiousness 

ACHIEVEMENT 

Work Orientation 
Self- Esteem 

LEADERSHIP (POTENCY) 

Dominance 
Energy Level 

LOCUS OF CONTROL 
Internal Control • 

AGREEABLEKESS/LIKEABILITY 
Cooperativeness 

PHYSICAL CONDITION 
Physical Condition 

ABLE Validity SralP. 

Non-Random Response 
Unlikely Virtues 
Self -Knowledge 



30 


106 


68.97 


C.59 


.46 


.87 


25 


106 


59.07 


6.28 


.40 


.78 


16 


106 


37.39 


4.25 


.41 


.67 


21 


106 


50.24 


5.31 


.41 


.75 


27 


106 


62.88 


7.77 


.48 


.86 


15 


106 


34.90 


4.71 


.52 


.80 


16 


106 


36.55 




.3/ 


• OD 


25 


106 


59.26 


7 40 




M 
• OO 


21 


106 


49.90 


6.27 


.46 


.80 


25 


106 


56.41 


6.70 


.43 


.81 


9 


106 


31.30 


6.96 


.73 


.87 


8 


117 


7.55 


.71 


.43 




12 


106 


16.63 


3.45 


.48 


.71 


13 


106 


29.75 


3.96 


.46 


.71 
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Table 7.7 



Fort Lewis Pilot Test; ABLE Scale Me ans and st^n^grd Deviations Separately 
for Males and Females 





Males 
(N - 87) 


Females 
(N - 19) 




Mean 


SD 


Mpan 


SD 


ABLE Substantive Scales 










ADJUSTMENT 










Emotional Stability 


69.78 


8.88 


65.26 


5.82 


DEPENDABILITY 










Nondellnquency 
Traditional Values 
Conscientiousness 


58.46 
37.13 
49 95 


6.28 
4.38 
5 49 


61 84 
38.58 


5 46 
3.30 


ACHIEVEMENT 










Work Orientation 
Self -Esteem 


62.17 
34.72 


7.78 
4.73 


66 11 
35.68 


6 89 
4.53 


LEADERSHIP (POTENCY) 










Dominance 
Energy Level 


36.66 
59.21 


6.10 
7.65 


36.05 
59.53 


5.95 
6.12 


LOCUS OF CONTROL 










Internal Control 


49.66 


6.31 


51.00 


5.93 


AGREEABLENESS/LIKEABILITY 










Cooperatlveness 


55 93 




DO. Do 


4.01 


PHYSICAL CONDITION 










Physical Condition 


31.64 


6.20 


29.74 


9.54 


ABLE Validity Scales 










Non-Random Response^ 
Unlikely Virtues 
Self -Knowledge 


7.50 
16.63 
29.54 


.72 
3.57 
4.00 


7.76 
16.63 
30.74 


.61 
2.81 
3.64 



Scale means and standard deviations are given here for data which are un- 
screened with respect to this scale. Thus, the N for males is 96 and for 
females is 21. 
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Table 7.8 



Fort Lfewls Pilot Test; ABLE Scale Means and Standard DeviaMons Separately 
for Slacks and Whites 





Blacks 


Whites 




(N . 


26) 


(N - 


63) 




nean 




Mean 




ABLE Substantive Scales 










ADJUSTMENT 










Emotional Stability 


00 •10 


7 

/ •00 


/U. Do 


O.OO 


DEPENDABILITY 










Nondelinquency 


60.65 


6.06 


58.86 


6.37 


Traditional Values 




C • 


*J7 

0 / .OO 


4.00 


Conscientiousness 


50.69 


4.45 


50.29 


5.76 


ACHIEVEMENT 










Work Orientation 


63.50 


6.40 


62.73 


8.63 


Self -Esteem 


34.54 


4.25 


35.29 


4.88 


LEADERSHIP (POTENCY) 










Dominance 


37.77 


3.43 


36.75 


6.80 


Energy Level 


57.35 


5.84 


59.83 


8.26 


LOCUS OF CONTROL 










Internal Control 


49.69 


4.74 


50.35 


6.81 


AGREEABLENESS/LIKEABILITY 










Cooperativeness 


fli 




30 • Uo 


7 1 *J 


PHYSICAL CONDITION 










Physical Condition 


31.92 


5.94 


30.95 


7.11 


ABLE Validity Scales 










Non -Random Response^ 


7.40 


.80 


7.69 


.52 


Unlikely Virtues 


16.15 


2.74 


16.63 


3.68 


Self -Knowledge 


31.23 


3.46 


29.43 


4.09 



*Scale means and standard deviations are given here for data which are un- 
screened with respect to this scale. Thus, the N is 30 for Blacks and 65 
for Whites. 
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Table 7.9 

Fort Lewis Pilot Teatt ABLE Scale Intercorrelatlona 
(N « 106) 




Emotional Stability 




35 


29 


32 


40 


56 


36 


69 


50 


50 


15 


32 


11 


Honde 1 inguency 


35 




68 


63 


61 


40 


25 


46 


55 


61 


22 


42 


13 


Traditional Values 


29 


68 




55 


54 


41 


31 


45 


64 


50 


22 


31 


10 


Conscientiousness 


32 


63 


55 




76 


59 


45 


60 


53 


40 


35 


46 


31 


Work Orientation 


40 


61 


54 


76 




74 


58 


74 


54 


44 


35 


41 


34 


Self-Esteem 


56 


40 


41 


59 


74 




64 


72 


57 


44 


33 


31 


32 


Dominance 


36 


25 


31 


45 


58 


64 




53 


38 


17 


37 


15 


25 


Energy Level 


69 


46 


45 


60 


74 


72 


53 




62 


50 


30 


35 


28 


Internal Control 


50 


55 


64 


53 


54 


57 


38 


62 




63 


13 


21 


28 


Cooperativeness 


50 


61 


50 


40 


44 


44 


17 


50 


63 




19 


30 


28 


Physical Condition 


15 


22 


22 


35 


35 


33 


37 


30 


13 


19 




14 


17 


Unlikely Virtues 


52 


42 


31 


46 


41 


31 


15 


35 


21 


30 


14 




-03 


Self-Knowledge 


11 


13 


10 


31 


34 


32 


25 


28 


28 


28 


17 


-03 





NOTE: Decimals have been omitted. 
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data. Table 7.10 presents the scale intercorrelations for the ten ABLE 
substantive scales (excluding the validity apd Physical Condition scales) 
with Sociil Desirability variance partialed out. As would be expected 
given the correlation between Unlikely Virtues and the other ABLE scales, 
the values in Table 7.10 are from 3 to 10 points lower than in Table 7.9. 
There is no readily apparent explanation for the differences in findings 
between the Fort Campbell and Fort Lewis samples except for sampling error, 
since both sample sizes are relatively small. 

Correlation matrices for the ten ABLE substantive scales from Fort 
Lewis were factor analyzed, both with and without the Social Desirability 
variance. Principal factor analyses were used, with rotation to simple 
structure by variraax rotation. Both factor matrices appear in Table 7.11. 
Though neither structure Is the same as was obtained when we factor ana- 
lyzed the Fort Campbell correlation matrix, the factor solution resulting 
when Social Desirability Is partialed out is quite similar to the solution 
obtained with the Fort Campbell data. The differences are that in the Fort 
Lewis solution. Energy Level loads on a factor with Emotional Stability, 
whereas in the Fort Campbell solution, Energy Level loads with Dominance 
and Self -Esteem. The other difference is that in the five- factor Fort 
Lewis solution, Cooperativeness forms a factor by itself, whereas in the 
four-factor Fort Campbell solution, Cooperativeness forms a factor with 
Emotional Stability. 

The structure *of the temperament and biodata domain, as measured by 
the ABLE during the pilot tests, could not be specified with certainty due 
to the relatively small pilot test sample upon which the correlational and 
factor analyses were run. The scales do, however, appear to be measuring 
the same content as the corresponding marker variables that were a part of 
the Preliminary Battery. The internal consistency reliabilities and score 
distributions of the ABLE scales are more than acceptable. 



2.92 



7-22 



Table 7.10 

Fort Lewis Pilot Test: ABLE Scale Intercorrelations 
With Social Deslrabnitv Variance Partlaled Out 




Emotional Stability 




25 


21 


20 


31 


5! 


37 


65 


47 


45 


Nondelinquency 


25 




63 


54 


52 


32 


21 


37 


52 


55 


Traditional Values 


21 


63 




49 


48 


35 


28 


•^9 


61 


45 


Conscientiousness 


20 


54 


49 




70 


53 


44 


53 


50 


31 


Work Orientation 


31 


52 


48 


70 




71 


57 


69 


51 


36 


Self -Esteem 


51 


32 


35 


53 


71 




63 


69 


54 


39 


Dominance 


37 


21 


28 


44 


57 


63 




52 


36 


14 


Energy Level • 


65 


37 


39 


53 


69 


69 


52 




60 


44 


Internal Control 


47 


52 


61 


50 


51 


54 


36 


60 




61 


Cooperatlveness 


45 


55 


45 


3J 


36 


39 


14 


44 


61 





NOTE: Decimals have been omitted. 
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Table 7.11 

Fort Lewis Pilot Test: Varimax Rotated Principal Factor Analyses 
of 10 ABLE Scales 





Five-Factor Solution 
With Social Desi**ability Variance Included 


ABLE Scale 


_1_ 


_li 


iil 




V 


Dominance 


.56 


.15 


.16 


.00 


.21 


Energy Level 


.45 


.19 


.32 


.22 


.79 


Self -Esteem 


,90 


.13 


.22 


.30 


.27 


Internal Control 


.33 




.15 


.44 


.29 


Traditional Values 


.18 


,78 


.29 


.22 


.10 


Nondelinquency 


.09 


.50 


.56 


.41 


.09 


Conscientiousness 


.40 


.34 


.61 


.14 


.16 


Work Orientation 


.57 


.25 


,?3 


.15 


.24 


Emotianal Stability 


.33 


.11 


.02 


.43 


.53 


Cooperativeness 


.08 


.30 


.21 


,77 


.22 




With Social 


Five-Factor Solution 
Desirability Variance Parti aled Out 




_L 


_U 


lU 


_iy 


V 


Dominance 




.15 


.23 


-.03 


.18 


Energy Level 


.39 


.18 


.8? 


.13 


.36 


Self- Esteem 


.79 


.12 


.32 


.24 


.19 


Internal Control 


.31 


,52 


.34 


.40 


.14 


Traditional Values 


1 7 


.83 


.10 


.17 


.18 


NnnHpl inou^ncv 


.08 


.56 


.06 


.40 


.42 


Conscientiousness 


.40 


.37 


.11 


.11 


.56 


Work Orientation 


.57 


.27 


.22 


.13 




Emotional Stability 


.30 


.08 


.60 


.35 


-.06 


Cooperativeness 


.06 


.31 


.26 


.78 


.13 
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INTERESTS CONSTRUCTS 



The seminal work of John Holland (1966) has resulted in widespread 
acceptance of a six-construct, hexagonal fflodel of intarests. Our principal 
problem in developing and testing an interests measure for Army testing was 
not which constructs to measure, but rather how much emphasis shoui he 
devoted to the assessment of each. 

As earlier stated, the interests inventory that had been used in the 
Preliminary Battery is called the VOICE {Vocational Interest Career Examin- 
ation), which had been developed and researched by the U.S. Air Force. 
This inventory served as the starting point for the AVOICE {Army Vocational 
Interest Career Examination). 

When developing the AVOICE, we sought to ensure that it would measure 
wen all six of Holland's constructs, as well as provide sufficient cover- 
age of the vocational areas most important in the Army. We wanted the 
inventory's items to parallel the job tasks of soldiers in a variety of 
nOS, while at the same time assessing a respondent's broad interests 
It"^:„«?5!l ®f constructs to be discussed next is adequately measured by 
the AVOICE; however, a greater degree of coverage is devoted to constructs 
judged most important for Army jobs. Table 7.12 shows the six Holland 
interests constructs assessed by the AVOICE, together with their associated 
scales. 

ad<*^ticn to the Holland constructs and associated scales, the 
AVOICE also included six constructs {20 scales) dealing with organizational 
Climate and environment preferences and an expressed interests scale 
Table 7.13 shows these variables and associated measures. 

As used in the pilot testing, the AVOICE included 306 items. Nearly 
all items were scored on a 5-point scale that ranged from "Like Very Much" 
(scored 5) to "Dislike Very Much", {scored 1). Items in the Expressed 
Interests scale were scored on a 3-point scale in which the response op- 
tions were different for each item, yet one option always reflected the 
most interest, one moderate interest, and one the least interest. 

We now discuss, in turn, each construct/category and the scales devel- 
oped for it. 

Realistic Interests 

This construct is defined as a preference for concrete and tangible 
activities, characteristics, and tasks. Persons with realistic interests 
enjoy and are skilled in the manipulation of tools, machines, and animals, 
but find social and educational activities and situations aversive. Real- 
istic interests are associated with occupations such as mechanic, engineer, 
and wildlife conservation officer, and negatively associated with such 
occupations as social work and artist. 

The Realistic construct is by far the most thoroughly assessed of the 
six constructs tapped by the AVOICE, reflecting the preponderance of work 
in the Army of a Realistic nature. Fourteen AVO.CE scales fall under this 
construct, <n addition to a Basic Interest item. 
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Table 7.12 



Holland Basi c Interest Constructs, and Anwv Vocational 
gyaml nation Scales Develoned for Pilot Trial Batterv; 
Army Vocatio nal Interest Career Examination 



Interest Career 
AVOICE > 



Construct 
Realistic 



Conventional 



Scale 

Basic Interest Item 
Mechanics 

Heavy Construction 
Electronics 

Electronic Coiranuni cation 

Drafting 

Law Enforcement 

Audlographlcs 

Agriculture 

Outdoors 

Marksman 

Infantry 

Armor/Cannon 

Vehicle Operator 

Adventure 

Basic Interest Item 
Office Administration 
Supply Administration 
Food Service 



Social 



Basic Interest Item 
Teaching/Counseling 



Investigative 



Enterprising 
Artistic 



Basic Interest Item 
Medical Services 
Mathematics 
Science/Chemical 
Automated Data Processing 

Basic Interest Item 
Leadership 

Basic Interest Item 
Aesthetics 
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Table 7.13 

Additional AVOICE Measures; Oroani-zational CUmate/Envlronment and 
Fx pyg^si»ri Interests Scales 



Lonsxrucx 




Achievement (Org. Climate/Environment) 


Achievement 


Authority 




Ability Utilization 


OaiGty vwiy* V* 1 1 niat 6/ uH Vi ronmen ty 


nrnan ^ y^il" ^ final Pnl^r^P^ and 


Procedures 




Supervision - Hunian Resources 




Supervision - Technical 


Comfort (Org. Climate/Environinent) 


Activity 


Variety 




Compensation 








Working Conditions 


Status (Org. Climate/Environment) 


Advancement 


Recooni tion 




Social Status 


Altruism (Org. Climate/Environment) 


Co-workers 


Moral Values 




Social Services 


Autonomy (Org. Climate/Environment) 


Responsibility 


Creativity 




Independence 


Expressed Interests 


Expressed Interests 
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The Basic Interest item, one of which is written for each Holland con- 
struct, describes a person with prototypic Realistic interests. The re- 
spondent indicates how well this description fits him/her. The remaining 
Realistic scales are discussed next. 

• The Mg£liani£i scale is a 16-item scale that measures interest in 
various kinds of mechanical work. Sample items include: 

• Replace valves in an engine. 

- Adjust a carburetor. 

• Hgavy CQpstruptjon is a 23-item scale dealing with interest in con- 
struction tasks. Example items are: 

- Mason. 

- Welder. 

- Construct a quick shelter in the woods. 

• Twenty items are included on the Electronics scale. Items from 
this scale include these: 

- Repair a television set. 

- Design a circuit board. 

- Wiring diagrams. 

• The Sl^ctrpn^c Con-Tiunication scale concerns interest in transmit- 
ting information electronically. This 7-item scale includes such 
items as: 

- Operate radio and teletype equipment. 

- Telecommunications. 

• ^ Realistic scale with seven items. Among the 
Drafting scale items are: ^ 

- Artist. 

- Draftsman. 

- Draw blueprints for a bridge. 

t Another Realistic scale is called Law Enforcement and includes both 
security and law enforcement components. Three of the scale's 16 
items are: 

- Highway patrol officer. 

- Prison guard. 

- Be a witness at a criminal trial. 

• The Audioqraphic!? scale, which has seven items, concerns activities 
associated with photography and movies. Items from this scale are: 

- Photographer. 

- Record the sound for a motion picture. 
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• One of the shortest Realistic scales, Agriculture , contains only 
five items. Two of the scale's items are: 

• Drive a tractor on a farm. 

- How lawns, clip hedges, and trim trees. 

• The Outdoors scale contains nine items including: 

- Work outdoors. 

- Go deer hunting. 

- Learn survival techniques for living in the wilderness. 

• The Harksman scale's five Items include: 

- Gunsmith. 

- Teach marksmanship. 

- Collect rifles and pistols. 

• The Infantry scale contains ten activities engaged in by infantry- 
men. Among these items are: 

- Use cover, concealment, and camouflage. 

- Clear a mine field. 

- Direct artillery fire. 

• Armor/Cannon is an 8-item scale that pertains to operating large 
ground-based weapons. The items include: 

- Zero in a tank's main gun. 

- Load and unload field artillery cannons. 

• The scale entitled Vehicle Operator includes t;ie following among 
its nine items; 

- Taxi driver. 

- Deliver cargo on time. 

- Operate a bulldozer or power shovel. 

f Finally, the Adv<>nture scale has eight items that include: 

- Explore a wilderness area alone. 

- Go skydiving. 

- Hunt wild animals in Africa. 

Eight ABLE items are also scored on the Adventure scale. Thus, we 
could obtain Adventure scores based on AVOICE items only, ABLE items only, 
or both. In this section, we will deal only with the eight AVOICE 
Adventure items. 

Conventional Interests 

The construct of Conventional interests refers to one's degree of 
preference for well-ordered, systematic and practical activities and tasks. 
Persons with Conventional interests may be characterized as conforming, 
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unimaginative, efficient, and calm. Conventional Interests are associated 
with occupations such as accountant, clerk, and statistician, and nega- 
tively associated with occupations such as artist or author. 

In addition to the Basic Interest Item, three scales fall under the 
Conventional interests construct --Office Administration, Supply Administra- 
tion, and Food Service. They have, respectively, 16, 13, and 17 items. 
"xa;iiple items from these three scales are: 

t Office Administration • 

- Make copies of a letter. 
• Keep accurate records. 

- Schedule appointments for other people. 

t Supply Administration - 

- Prepare materials, equipment, or supplies for shipment. 

- Make out Invoices. 

- Take Inventory for a department store. 

t Food Service - 

- Dishwasher. 

- Buy food supplies for a restaurant. 

- Wash, peel and dice vegetables. 

Social Interests 

Social interests are defined as the amount of liking one has for 
social, helping, and teaching activities and tasks. Persons with social 
Interests may be characterized as responsible, idealistic, and humanistic. 
Social interests are associated with occupations such as social worker, 
high school teacher, and speech therapist, and negatively associated with 
occupations such as mechanic or carpenter. 

t Besides the Basic Interest item, only one scale is included in the 
AVOICE for assessing Social Interests, the Teachino/CounselinQ 
scale. This 7- item scale includes items such as: 

- Gi.2 on-the-job training. 

- Organize and lead a study group. 

- Listen to people's p*'oblems and try *o help them. 

Investigative Interests 

This construct refers to one's preference for scholarly, intellectual, 
and scientific activities and tasks. Persons with Investigative interests 
enjoy analytical, ambiguous, and Independent tasks, but dislike leadership 
and persuasive activities. Investigative interests are associated with 
such occupations as astronomer, biologist, and mathematician, and nega- 
tively associated with occupations such as salesman or politician. 

Along with the Basic Interest item. Medical Services, Mathematics, 
Science/Chemical, and Automated Data Processing are the four AVOICE scales 
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that tap Investigative Interests. The scales differ in length with Medical 
Services containing 24 items; Mathematics, 5; Science/Chemical, 11; and, 
Aute:!iated Data Processing, 7. Again, selected scale items are supplied 
below. 

f Medical Services - 

- Physical Therapist. 

- Take blood pressura readings. 

- Disease prevention. 

• - Mathematics - 

• Solve arithemetic problems. 

- Find information in numerical tables. 

- Work with numbers. 

• Science/Chemical - 

- Mix chemical compounds. 

- Record observations from scientific instruments. 

- Work with hazardous chemicals. 

• Automated Data Processing - 

- Computer Operator. 
Computer Programmer. 

- Operate a machine that sorts punched cards. 

EnterorisInQ Interests 

The Enterprising interests construct refers to one's preference for persua- 
sive, assertive, and leadership activities and tasks. Persons with Enter- 
prising interests may be characterized as ambitiC'^, dominant, sociable, and 
self-confident. Enterprising interests are associated with such occupa- 
tions as salesperson and business executive, and negatively associated with 
occupations such as biologist or chemist. 

• Again, besides the Basic Interest item, only one AVOICE scale 
assesses the respondent's Enterprising interests. This scale, 
entitled Uadfirshia, contains six items including the following: 

- Mold a group of coworkers into an efficient team. 

- Inspire others with a speech. 

- Make decisions when others do not know what to do. 

Artistic TntPrests 

This final Holland construct is defined as a person's degree of liking 
for unstructured, expressive, and ambiguous activities and tasks. Persons 
with Artistic interests may be characterized as intuitive, impulsive, 
creative, and non-conforming. Artistic interests are associated with such 
occupations as writer, artist, and composer, and negatively associated with 
occupations such as accountant or secretary. 
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• In addition to the Basic Interest item, the AVOICE Aesthetics scale 
Is designed to tap Artistic Interests, and includes five items. 
Among these items are: 

- Read poetry. 

- Watch educational television. 

- Classical music. 

Organizational Climate/Environment Scales 

Six constructs that pertain to a person's preference for certain types 
of work environments and conditions are assessed by the AVOICE through 20- 
item scales. Tfese environmental constructs include Achievement, Safety, 
Comfort, Status, Altruism, and Autonomy. The items that assess these 
constructs are distributed throughout the AVOICE, and are responded to in 
the same manner as the interests items, that is, "Like Very Much" to 
"Dislike Very Much." 

Because the scales contain only two items each and for ease of presen- 
tation. Figure 7.1 is used to show the constructs, scales, and an item from 
each scale. 

Expressed Interests Scale 

Although not a psychological construct, expressed interests were in- 
cluded in the AVOICE because of the extensive research showing their valid- 
ity in criterion-related studies. (Dolliver, 1969) These studies had 
measured expressed interests simply by asking respondents what occupation 
or occupational avea was of most interest to them. In the AVOICE, such an 
open-ended question was not feasible, instead, respondents were asked how 
confident they were that their chosen job in the Army was the right one for 
there. 

This Expressed Interests scale contained eight items which, as mentioned, 
had three response options' that rormed a continuum of confidence in the 
person's occupational choice. Selected items from this scale include: 

- Before you went to the recruiter, how certain were you of 
the Job you wanted in the Army? 

- If you had the opportunity right now to change your job in 
the Army, would you? 

- Before enlisting, how long were you interested in a particu- 
lar Army job? 
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Construct/Scale 

AchleviiiHent 

Achlevenent 
Authority 
Ability 
Utilization 

Safety 

Organizational 

Policy 
Supervision - 

Human Resources 
Supervision - 

Technical 

0-rfort 

Activity 
Variety 
Compensation 
Securi iy 

Working Conditions 

StatMS 

Advancement 
Recognition 
Social Status 

Altruism 
Co-workers 
Moral Values 
Social Services 

Autonomy 

Responsibility 

Creativity 

Independence 



"Do work that gives a feeling of accomplishment." 
"Tell others what to do on the job." 

"Make full use of your abilities." 

"A job in which the rules are not equal for everyone." 
"Have a boss that supports the workers." 
"Learn the job on your own." 



"Work on a job that keeps a person busy." 
"Do something different most days at work.' 
"Earn less than others do." 
"A job with steady employment." 
"Have a pleasant place to work." 



"Be able to be promoted quickly." 

"Receive awards or compliments on the job." 

"A job that does not stand out from others." 



"A job in which other employees were hard to get to 
to know." 

"Have a job that would not bother a person's 
conscience." 

"Serve others through your work." 



"Have work decisions made by others." 
"Try out your own Ideas on the job." 
"Work alone." 



Figure 7.1. Organizational climate/environment preference constructs, 
scales within constructs, and an item from each scale. 
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AVOICE REVISIONS BASED ON PILOT TESTING 



As with the ABLE, before we present the data obtained from pilot test- 
ing, we 'describe the revisions made in the AVOICE on the basis of pilot 
test adimni strati on at Fort Campbell and Fort Lewis. Again, the changes 
are discussed for the AVOICE as a whole, rather than scale by scale. These 
changes resulted in the AVOICE version to be used in the field test. 

Overall, the revisions made were far less substantial for the AVOICE 
than for the ABLE. Editorial review of the inventory by PORI and ARI 
staff, together with the verbal feedback from Fort Campbell soldiers, 
resulted in revision of 15 items--primarily minor wording changes. An 
additional five items were modified because of low item correlations with 
the total scale score in the Fort Campbell data. No items were deleted 
based on the editorial review, verbal feedback, or item analyses. 

Following the Fort Lewis pilot test, no revisions or deletions were 
made to the AVOICE items. Item response frequencies were examined to 
detect items that had relatively little discriminatory power, that is, 
three or more of the five response choices received less than 10 percent 
endorsement. There proved to be only two such items, and, upon examination 
of the item content, it was decided not to revise these. Both items 
appeared well written and relevant to the targeted content, and we thought 
the poor response distribution could be attributed to sampling error. 

Thus, a total of only 20 AVOICE items were revised on the basis of 
editorial review and pilot testing. Part of this low level of revision may 
be due to the common response scale of the inventory, "Like Very Much" to 
"Dislike Very Much." The response options appeared to be well -understood 
and did not require the item-by-item review/revision that was necessary for 
the ABLE items (which had differing response options by item). 
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PILOT TEST DATA FOR THE AVOICE 



port Campbell 

In the Fort Campbell pilot test, a total of 57 soldiers completed the 
AVOICE, 55 of whom provided sufficient data for analyses. Scale statistics 
for this sample are presented in Table 7.14. As can be seen in the table, 
the mean item-total correlations and Hoyt reliabilities are excellent, 
generally in the .60s to .80s for the former, and .70s to .90s for the 
latter. In addition the means and SDs indicate acceptable scale score 
distributions in almost all cases. 

Fort Lewis 

The responses of four of 118 soldiers were eliminated for exceeding 
the missing data criterion (10%), resulting in an analysis sample size of 
114. Scale statistics for this sample are shown in Table 7.15. Reliabili- 
ties are again excellent and are even slightly higher than the values 
obtained at Fort Campbell. 

AVOICE scale means and standard deviations were also calculated sepa- 
rately for males and females and for blacks and whites (see Tables 7.16 and 
7.17), but :iote that sample sizes are very small for females and blacks. 
These data are viewed as exploratory only. As would be expected on the 
basis of previous research, there are marked differences between the sexes 
in mean score on certain interest scales. Scales such as Mechanics and ' 
Heavy Construction show far greater scores for males than females. On the 
majority of the scales, however, the differences are less pronounced. Dif- 
ferences are also relatively sn.all between blacks and whites. Table 7.18 
presents the AVOICE scale intercorrelations for the Fort Lewis sample. We 
performed no detailed analyses of these correlations, but did inspect the 
matrix to see if scales expected to correlate fairly highly did so (for 
example, Infantry with Armor/Cannon) and scales not expected to correlate 
highly, or even negatively, did so (for example. Aesthetics with Infantry). 
This pattern did indeed hold true, in most cases. 
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Table 7.14 

Fort Campbell Pilot Test: AVOICE Scale Statistics (N - 55) 



No. 

AVOICE Scale Items 

REALISTIC 

Basic Interest Item 1 

Mechanics 16 

Heavy Construction 23 

Electronics 20 

Electronic Comnunication 7 

Drafting 7 

Law Enforcement 16 

Audiographlcs 7 

Agriculture 5 

Outdoors 9 

Marksman 5 

Infantry 10 

Armor/Cannon 8 

Vehicle Operator 9 

Adventure 8 

CONVENTIONAL . 

Ba'.lc Interest Item 1 

Otflce Administration 16 

Supply Administration 13 

Food Service 17 

SOCIAL 

Basic Interest Item 1 

Teach i ng/Counsel i ng 7 

INVESTIGATIVE 

Basic Interest Item 1 

Medical Services 24 

Mathematics 5 

Science/Chemical 11 

Automated Data Processing 7 

ENTERPRISING 

Basic Interest Item 1 

Leadership 6 

ARTISTIC 

Basic Interest Item 1 

Aesthetics 5 

(Cent 



Mean 

Item-Total Hoyt 
Hean SD Correlation Reliability 



1.95 


.75 






49.91 


14.54 


.75 


.95 


65.84 


16 13 


64 




65.45 


17.48 


.75 


.96 


20.00 


5.15 


.64 


.76 


20.84 


5.04 


.62 


.75 


47.78 


10.59 


.55 


.83 


23.05 


4.32 


.m 


.69 


14.29 


3.51 


.60 


.55 


32 20 


6 77 




• OI 


15.25 


4.64 


.77 


.82 


26.93 


6.66 


.57 


.78 


22.29 


6.51 


.71 


.87 


24.93 


7/ 


.69 


.87 


18.87 


2 li 


39 




2.02 


.65 


-- 


-- 


41.84 


13.37 


.74 


.94 


32.64 


9.88 


.72 


.92 


39.18 


8.18 


.49 


.81 


2.22 


.78 






22.33 


5.41 


.67 


.80 


1.38 


.52 


-- 


— 


66.02 


17.46 


.66 


.95 


14.09 


3.79 


.69 


.73 


29.15 


7.60 


.61 


.84 


23.69 


6.12 


.73 


.86 


1.84 


.68 






19.93 


4.88 


.69 


.78 


1.62 


.67 






13.33 


4.00 


.74 


.79 


lued) 
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Table 7.14 (Continued) 

Fort Campbell Pilot Test; AVQICE Scale Statistics 

Mean 

No. Item-Total Hoyt 

AVQICE Scale Items Mean SD Correlation Re1iabnit\ 

ACHIEVEMENT 
(Org. Climate/Environment) 

Achievement 
Authority 

Ability Utilization 
SAFETY 

(Org. Climate/ Environment) 

Organizational Policies 

and Procedures 
Supervision -Human Resources 
Supervision-Technical 

COMFORT 

(Org. CI imate/Envi ronment) 

Activity . 
Variety 
Compensation 
Security 

Working Conditions 
STATUS 

(Org. Climate/Environment) 

Advancement 
Recognition 
Social Status 

ALTRUISM 
(Org. Climate/Environment) 

Co-workers 
Moral Values 
Social Services 

AUTONOMY 
(Org. Climate/Environment) 

Responsibility 
Creativity 
Independence 

EXPRESSED INTEREST 



2 


1.76 


1.60 


.75 




2 


.25 


1.72 


.70 




*% 
U 


1.49 


1.41 


.76 




2 


2.09 


1.27 


.69 




2 


2.20 


1.64 


.74 




2 


.40 


1.84 


.68 




2 


1.45 


1.55 


.71 




2 


1.31 


1.58 


.81 




2 


2.58 


1.51 


-75 





2 


2.85 


1.30 


.77 




2 


1.98 


1.51 


.78 




2 


1.67 


1.45 


.69 




2 


1.20 


1.81 


.73 




2 


1 42 


1 69 


75 




2 


2.16 


1.45 


.83 




2 


1.60 


1.66 


.71 




2 


6.98 


1.80 


.82 




2 


1.65 


1.36 


.66 




2 


.91 


1.38 


.58 




2 


-.44 


1.25 


.69 




8 


15.15 


3.89 


.54 


.30 
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Table 7.15 

Fort lewis Pilot Test; AVOICE Scale St atistics for Total Group 
(H ■ 114) 



AVOICE Scale 

REALISTIC 

Basic Interest Item 

Mechanics 

Heavy Construction 

Electronics 

Electronic Co«mini cation 

Drafting 

Law Enforcement 

Audiographics 

Agriculture 

Outdoors 

Harksman 

Infantry 

Armor/Cannon 

Vehicle Operator 

Adventure - 

CONVENTIONAL 

Basic Interest Item 
Office Administration 
Supply Administration 
Food Service 

SOCIAL 

Basic Interest Item 
Teaching/Counsel Ing 

INVESTIGATIVE 

Basic Interest Item 
Hedical Services 
Mathematics 
Science/Chemical 
Automated Data Processing 

ENTERPRISING 

Basic Interest Item 
Leadership 



No. 

Items 



Mm _SIL 



Mean 
Item-Total 
Correlation 



1 


3.09 


1.17 




16 


53.02 


13.13 


.73 


23 


72.57 


15.64 


.62 


20 


63.94 


16.86 


.75 


7 


21.44 


5.73 


.73 


7 


22.62 


6.11 


.76 


16 


50.82 


11.33 


.63 


7 


24.30 


5.12 


.69 


5 


15.24 


3.62 


.61 


9 


33.09 


6.25 


.62 


5 


16.57 


4.48 


.79 


10 


31.04 


7.26 


.64 


8 


23.46 


6.15 


.67 


10 


30.45 


7.10 


.65 


8 


18.84 


3.60 


.57 



1 


3.00 


.92 




16 


45.39 


12.61 


.72 


13 


36.97 


9.65 


.71 


17 


43.46 


10.53 


.59 


1 


3.25 


1.03 




7 


23.61 


5.20 


.71 


1 


3.09 


.95 




24 


71.32 


16.65 


.66 


5 


15.82 


4.20 


.75 


11 


30.29 


8.41 


.68 


7 


24.29 


5.78 


.74 


1 


3.11 


1.13 




6 


20.71 


4.41 


.72 



(Continued) 
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Hoyt 
Reliability 



.94 
.92 
.96 
.85 
.87 
.89 
.81 
.58 
.80 
.84 
.84 
.83 
.84 
.72 



.94 
.92 
.89 



.83 



.94 
.80 
.88 
.86 



.81 
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Teble 7.15 (Continued) 

Fort Lewis Pilot Test: AVQTCE Scale Statistics for Total Group 
(N - 114} 



Mean 

No. Item-Total Hoyt 

AVOTCE Scale Items Mean SD Correlation Reliability 

ARTISTIC 



Basic Interest Item 


1- 


2.99 


1.27 






Aesthetics 


5 


14.73 


4.12 


.74 


.79 


ORGANIZATIONAL CLIMATE/ 












ENVIRONMENT DIMENSIONS 












Achievement 


6 


21.09 


2.95 






Safety 


6 


21.64 


3.20 


«• 




Comfort 


10 


38.50 


3.83 






Status 


6 


21.37 


2.97 






Altruism 


6 


21.67 


3.28 


«• «■ 




Autonomy 


6 


20.46 


2.33 






EXPRESSED INTEREST 


8 


15.71 


3.19 


.59 


.66 



7-39 



30j 



Table 7.16 



Fort Lewis Pilot Test: AVOICE M&ans and Standard Deviations 
Seoaratelv for Hales and Females 



Basic Interest Item 


3. 


24 


1.13 


Hechanics 


54. 


93 


12.51 


Heavy Construction 


75. 


31 


13.24 


Electronics 


66. 


38 


15.95 


Electronic Communication 


21. 


48 


5.73 


Drafting 


22. 


97 


6.11 


Law Enforcement 


51. 


72 


11.41 


Audiographics 


24. 


27 


5.03 


Agriculture 


15. 


46 


3.59 


Outdoors 


33. 


94 


5.75 


Harksman 


17. 


35 


4.05 


Infantry 


31. 


94 


7.14 


Arraor/Cannon 


24. 


21 


5.99 


Vehicle Operator 


31. 


05 


6.52 


Adventure 


19. 


39 


3.28 



Males Females 

(N - 87) (N - 19) 

AVOICE Scale Mean SD Mean SD 

REALISTIC 

2.35 1.11 

44.05 12.28 

59.70 19.22 

52.45 16.23 

21.25 5.72 

21.00 5.83 

46.60 9.95 

24.45 5.52 

14.20 3.57 

29.10 6.92 

12.90 4.56 

26.85 6.28 

19.95 5.71 

27.60 8.81 

16.32 3.91 

CONVENTIONAL 

Basic Interest Item 2.97 .92 3.15 91 

Office Administration 44.91 11.93 47.60 15.19 

Supply Administration 36.95 9.56 37.10 10.09 

Food Service 42.54 9.89 47.80 12.23 

SOCIAL 

Basic Interest Item 3.24 1.05 3.30 .95 

Teaching/Counseling 23.15 5.13 25.75 4.97 

INVESTIGATIVE 

Basic Interest Item 3.10 .95 3.05 .97 

Medical Services 71.10 16.65 72.40 16.59 

Mathematics 15.59 4.31 16.95 3.40 

Science/Chemical 30.99 8.69 27.00 5.96 

Automated Data Processing 24.20 5.97 24.70 4.76 



(Continued) 
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Table 7.16 (Continued) 

Fort lewis Pilot Test: AVOICE Means and Standard Deviations 
Seoairatelv for Males and Females 



Males Females 

(N - 87) (N - 19) 

AVOICE Scale N? an $P . Hean §EL_ 

ENTERPRISING 

Basic Interest Item 3.14 1.14 2.95 1.02 

Leadership 20.53 4.61 21.55 3.17 

ARTISTIC 

Basic Interest Item 2.26 1.25 3.15 1.31 

Aesthetics 14.29 4.22 16.80 2.77 

ORGANIZATIONAL CLIMATE/ 
ENVIRONMENT DIMENSIONS 

Achievement- 20.97 2.92 21.65 3.02 

Safety 21.59 3.36 21.90 2.23 

Comfort 38.26 3.76 39.65 3.97 

Status 21.22 3.00 22.05 2.73 

Altruism 21.48 3.26 22.55 3.26 

Autonomy 20.45 2.22 20.55 2.78 

EXPRESSED INTEREST 15.79 3.34 15.35 2.29 
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Table 7.17 

Fort Lewis Pilot Test: AVOICE Means and Standard Deviations Spnarat<»1v fnr 
Blacks and Whites 

Blacks Whites 

(N - 27) (N - 65) 

AVOICE Seal? Mean SD Mean $D 

REALISTIC 



Basic Interest Item 2.81 

Mechanics 50.96 

Heav^ Construction 67.85 

Electronics 66.33 
Electronic Communication 23.22 

Drafting 23.81 

Law Enforcement 48.04 

Audiographics 25.00 

Agriculture 14.04 

Outdoors 29.81 

Marksman 15.48 

Infantry 29.37 

Armor/Cannon 22.26 

Vehicle Operator 29.37 

Adventure - 15.58 

CONVENTIONAL 

Basic Interest Item 3.07 

Office Administration 51.37 

Supply Administration 41.19 

Food Service 48.74 

SOCIAL 

Basic Interest Item 3.22 

Teaching/Counseling 25.04 

INVESTIGATIVE 

Basic Interest Item 3.11 

Medical Services 77.81 

Mathematics 17.22 

Science/Chemical 29.96 
Automated Data Processing 27.93 

ENTERPRISING 

Basic Interest Item 3.30 

Leadership 21.44 



1.39 3.26 1.06 

12.29 54.20 12.90 

14.10 75.69 14.55 

14.94 64.20 16.77 

4.37 21.38 5.82 
5.00 22.46 6.57 

12.22 53.43 10.40 

4.58 24.82 5.05 

3.49 16.18 3.56 

5.12 35.28 5.19 

3.47 17.54 4.51 

6.38 32.68 7.41 
5.20 24.43 6.43 
7.42 31.42 6.92 
3.32 20.11 2.70 



.77 2.92 .98 

10.00 43.65 13.45 

8.68 35.72 10.42 

8.52 41.63 11.04 



.92 3.28 1.07 

4.61 23.48 5.50 



1.10 3.14 .91 

12.88 69.35 17.68 

4.05 15.22 4.25 

6.58 31.23 9.15 

3.87 23.63 5.90 



1.01 3.05 1.14 

3.82 20.97 4.59 



(Continued) 
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Table 7.17 (Continued) 



Fort Lewis Pilot Test: AVOICE Hons and Standa rd Deviations Separately for 
Blacks and Whites 

Blacks Whites 

(N - 27) (N - 65) 

AVOICE Scale Hg?tn $P Jlsau S1L_ 

ARTISTIC 

Basic Interest Item 3.44 1.37 2.88 1.23 

Aesthetics 15.59 3.29 14.66 4.50 

ORGANIZATIONAL CLIMATE/ 
ENVIRONMENT DIMENSIONS 

Achievement 20.19 3.40 21.65 2.73 

Safety 21.22 3.46 22.12 2. 85 

Comfort 37.44 4.27 39.31 3.45 

Status 21.48 2.69 21.74 2.97 

Altruism 21.48 3.55 22.18 3.07 

Autonomy 19.26 2.08 20.95 2.14 

EXPRESSED INTEREST 16.00 2.93 15.58 3.30 
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Table. 7.18 

Eort Lewis PlUt. T»f;t; flV Qicc srale Tntercorrelatlons 



VARt 

Ift AVOICC HAKKINAN 
201 AVOICC AOKICULTUKC 

<7tt AVOICC NATHCNATICI 

221 Avoicr AriTHrrici 

231 AVOICC LCAnCKIHIf 

7At AVOtCC CUCTKONIC CONNUN. 

2St AVOICC AUTOHAUn DATA MOC« 

24 t AVOXCC. TrACHXNI/COUNICLINO 

27t AVOlCr OKAFTINO- 

2tt AVOICC AUPXOORrPHICi 

2f I AVCtCC AKNDK/CANNON 

301 AVOICC VCHXCLC OPCKATOK 

3ti AVotr:c pu.^ooKi 

32t AVOICC rNrANTftY 

33 1 AVOICC fCirNCC/CHCNJCAL 

34 1 AVOlCr iUfftY AflN. ' 

3st Avotcr orricr aoh.* 

3it AVOICr LAU CNFORCCNCNT 
37t AVPICr nrCHAHlCI 
3tt AVOICJ CLCCTKONICS 
, 391 AVOICC HCAVY CONITKUCTION 

40 t AVOICC NCDICAL 

lltAVOtCC FOOD ICKVlCr 
42t AvaiCr HOLLAND INVCIT. 
43 1 AVOICC HOLLAND CDNVCNT. 
441 AVOtCC HOLLAND AKT. 
43 1 AVOICr HOLLAND KEAL. 

44 1 AVOlnr HOLLAND lOClAL 
471 AVOICC HOLLAND CNTrHfH. 
4IIAV0ICC CXPKCSSED INTEKCIT 
4tl AVOICC ADILITY UTIL. 
sot. AVOICC ACHICV^nrNT 

Sit AVOICC ACTIVITY 

S2t AVDlCC ADVANCCNCNT 

S3t AVOlCC AUTHOKITY 

S4t AVOlCC DKPANIZATION Pt^ 

SSt AVOICC CDNPCNIATION 

Sit AVOinr CO-UOKKCRI 

S7t AVDICC CKCATIVITY 

Stt AVOirr INDrrCNOENCC 

Sft AVOICZ JtOftAL VALUES 

AOS AVOICC KCCOGNITION 

Alt AMOlCC KCIPONIIIILITY 
A2t AVOICC lECUKSTY 
A3t AVOICC SOCAII lEKVlCE 
44 t AVOICE lOClAL ITATUI 
ASt AVOICE lUPEKVIIlON HK 

Ut AVOICC WCKVISION TECH 
471 AVOICE VARIETY 
Q '11 AVOICE UOKKINO COND* 
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Table 7.18 
Fort Lewis 



(Continued) 

Pilot Test! A VOICE Scale 
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SUMMARY 



The two non-cognitive inventories of the Pilot Trial Battery, the ABLE 
and the AVOICE, are designed to measure a total of 20 constructs plus 
response validity scale and expressed interests categories. The ABLE 
assesses six temperament constructs and the Physical Condition construct 
through 11 scales, and also includes four response validity scales. The 
AVOICE measures six Holland interests constructs, six Organizational Envir- 
onment constructs, and Expressed Interests through 31 scales. Altogether, 
the 46 scales of the two inventories included approximately 600 items 
during the pilot testing phase-291 ABLE items and 306 AVOICE items for the 
Fort Campbell version, and 268 ABLE items and 306 AVOICE items for the 
Fort Lewis version. 

Evaluation and revision of the inventories took place in three steps. 
First, each was subjected to editorial review by both PORI and ARI prior to 
any pilot testing. This review resulted in nearly 200 wording changes and 
the deletion of 17 items. The majority of these changes applied to ABLE. 

The second stage of evaluation took place after the Fort Campbell pilot 
testing. Feedback from the soldiers taking the inventory and data analysis 
of the results (e.g., item-total correlations, item response distributions) 
were used to refine the inventories. Twenty-three ABLE items were deleted 
and 173 ABLE items were revised; no AVOICE items were deleted and 20 AVOICE 
items were revised. 

In the third stage of evaluation, after the Fort Lewis pilot testing, 
far fewer changes were made. One ABLE item was deleted, 20 ABLE items were 
revised, and no changes were made to the AVOICE. Throughout the evaluation 
process, it is likely that the AVOICE was less subject to revision because 
it uses a common response format for all items, whereas the response op- 
tions for ABLE items differ by item. 

The psychometric data obtained with both inventories seemed highly 
satisfactory; the scales were shown to ta reliable and appeared to be 
measuring the constructs intended. Sample sizes in these administrations 
were fairly small (Fort Campbell N - 52 and 55, ABLE and AVOICE, respec- 
tively; Fort Lewis N - 106 and 114, ABLE and AVOICE, respectively), but 
results were similar in both samples. 
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CHAPTER 8 



NON-COGNITIVE KEASURES: FIELD TESTS 

Leaetta H. Hough, HatthdW K. HcGue, Janis S. Houston, 
and Elaine 0. Pulakos 



In this chapter we '-scribe the field tests of the non-cognitive 
measures in the Pilot Trial Battery, the ABLE and the AVOICE, whose devel- 
opment was described in Chapter 7. Portions of this chapter are drawn from 
Hough, Barge, Houston, McGue, and Kamp (1985). 

We first discuss the results of the Fort Knox field test in September 
1984, the general procedures for which were described in Chapter 2. We 
also discuss here the procedures and results of the field testing done at 
Fort Bragg, where the ABLE and AVOICE were administered to soldiers under 
several experimental conditions, in order to estimate the extent to which 
scores on these inventories could be "faked" when individuals are in- 
structed to do so. We also describe, in the context of this "fakability" 
study, the procedures and results of the ABLE and AVOICE administration to 
recruits at the Military Entrance Processing Station (MEPS) at 
Minneapolis. 

Figures 8.1 and 8.2 list the entire set of scales, by construct, 
contained in the Fort Knox version of the ABLE and AVOICE, respectively. 
Chapter 7 presented a complete description of each of these constructs and 
scales, with sample items, and the two inventories themselves in the form 
administered at Fort Knox, may be found in Appendix G. 
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Construct 



Scale 



Adjustnent 
Dependability 

Achlevenent 

Physical Condition 
Leadership (Potency) 

Locus of Control 
Agreeabl eness/LI keabi 1 1 ty 
Response Validity Scales 



Emotional Stability 

Nondelinquency 
Traditional Values 
Conscientiousness 

Work Orientation 
Self- Esteem 

Physical Condition 

Dominance 
Energy Level 

Internal Control 

Cooperatlveness 

Non-Random Response 

Unlikely Virtues (Social Desirability) 

Poor Impression 

Self-Knowl edge 



Figure 8.1 ABLE scales organized by construct. 
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Realistic Interests 
Saslc Interest Item 
Hechanics 

Heavy Construction 
Electronics 

Electronic Communication 

Drafting 

Law Enforcement 

Audiographics 

Agriculture 

Outdoors 

Marksman 

Infantry - 

Armor/Cannon 

Vehicle Operator 

Adventure 
Conventional Interests 

Basic Interest Item 

Office Administration 

Supply Administration 

Food Service 
Social Interests 

Basic Interest Item 

Teachi ng/Counsel i ng 



Investigative Interests 

Basic Interest Item 

Medical Services 

Mathematics 

Science/Chemical 

Automated Data Processing 
Enterprising Interests 

Basic Interest Item 

Leadership 
Artistic Interests 

Basic Interest Item 

Aesthetics 

Organizational Climate/ 
Environment Preferences 

Achievement Preferences 

Safety Preferences 

Comfort Preferences 

Status Preferences 

Altruism Preferences 

Autonomy Preferences 

Expressed Interests 

Expressed Interests 



Figure 8.2 AVOICE scales organized by construct. 
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ANALYSIS OF DATA FROM FIELD TEST ADMINISTRATION 



Results of Data Quality Screening 

In Table 8.1, the data screening results are presented for the Fort 
Knox field test. A total of 290 soldiers completed the ABLE and 287 
soldiers completed the AVOICE. After deletion of inventories with greater 
than 10 percent missing data for both inventories, and deletion of those 
ABLEs where scores on the Non-Random Response Scale (NRRS) were less than 
six, a total of 276 ABLEs and 270 AVOICEs were available for analysis. 

Recall from Chapter 2 that portions of the Pilot Trial Battery were 
re-administered to soldiers two weeks after the first administration. As 
can be seen in Table 8.1, the tota: number of "Time 2" ABLE and AVOICE 
inventories, after the data quality screens had been applied, was 109 and 
127, respectively. 

Mean Scores a nd Reliability Estimates 

Summary statistics for the non-cognitive measures are presented in 
Tables 8.2, 8.3, and 8.4. Several things are noteworthy in Table 8.2. All 
the ABLE content scales show adequate score variances (SD ranges from 5.25 
to 8.27) and the alpha coefficients are acceptable to excellent in value 
(median - .84, range - .70 to .87). In passing, we point out that there 
was no particular technical reason for computing alpha coefficients on the 
field test data rather than Hoyt coefficients as was done for the pilot 
data test (see Chapter 7). Both procedures provide conceptually identical 
estimates of intern?."! consistency reliability and provide nearly identical 
mathematical results. Other work on Project A was using the alpha coeffi- 
cient procedure, so we decided to use the same procedure for the sake of 
greater project -wide consistency. The test-retest coefficients are all at 
or greater than acceptable levels (median - .79, range - .68 to .83), and 
in most cases are near the same value as the alphas, indicating excellent 
stability for these scale scores. 

The response validity scales have score variances as expected. Un- 
likely Virtues and Self -Knowledge scores are nearly normally distributed 
with somewhat less variance than the content scales, but still on an ac- 
ceptable level. The Non-Random Response and Poor Impression scales show 
markedly skewed distributions as would be expected for subjects responding 
attentively and honestly. The alphas for these scales are a bit lower than 
for the content scales, again as expected. The test-retest coefficients 
are also a bit lower, especially for Non-Random Response. However, the 
variance is small on this scale (again, as it should be) and the distribu- 
tion is skewed, so even small changes in responses can have a laige effect 
on this coefficient. 

Table 8.3 shows more detail about the test-retest results for the 
ABLE. The results for the content scales, which are the most important 
scales in terms of predicting job performance and other criteria, are 
remarkable for their consistency. There was virtually no change in mean 
scores between the two administrations, and the effect sizes are very 
smal 1 . 

The response validity scales appear to be more sensitive to changes 
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Table 8.1 

Fort Knox Field Test; Data Quality Screen Results 



Total N at Sessions 
ABLE 

H taking this Inventory 

Number deleted with Overall Missing 
Data Screen (>10%, or 27 Items) 

Number deleted with NRRS* Screen 
(<6 "correct" out of 8) 

N usable ABLEs 

AVOICE 

N taking this Inventory 

Number deleted with Overall Missing 
Data Screen (>10%, or 31 Items) 

N usable AVOICEs 



Fort Knox 
Time 1 

303 



290 



Fort Knox 
Time 2 

258 



9 (3%) 



287 



17 (6%) 



128 



130 



(5%) 



5 (2%) 12 (9%) 



276 (95%) 109 (85%) 



(2%) 



270 (94%) 127 (98%) 



*Non-Random Response Scale. 
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Table 8.2 

Fort Knox Field Test; ABLE Scale Score Characteristics 



(N = 276 except where otherwise noted) 



Median 







Mean 






Test-Ketest* 


Iteai-Scale 


Sstii 


Item 


TfW 1 




Alpha 


r 


r - 


Content ScalM 














EMOtional Stability 


29 


64.9 


8.27 


.86 


.68 


.44 


Self-€ttMi 


15 


35.1 


5.25 


.83 


.81 


.54 


Cooptrativenest 


24 


54.1 


6.09 


.77 


.69 


.42 


Comcientiousnett 


21 


48.9 


5.90 


.81 


.73 


.43 


MondelirKiuency 


24 


55.4 


7.23 


.84 


.81 


.46 


Traditional Valuta 


16 


37.2 


4.60 


.70 


.74 


.45 


Work Orientation 


27 


61.2 


7.93 


.85 


.80 


.47 


Inttmal Control 


21 


50.3 


6.14 


.79 


.75 


.43 


Energy Level 


25 


57.1 


7.11 


.85 


.79 


.47 


Dominance 


16 


35.5 


6.13 


.86 


.83 


.56 


Physical Condition 


9 


31.1 


7.53 


.87 


.81 


.72 


Reeponee Validity Scales 














Unlikely Virtues 


12 


16.6 


3.39 


.68 


.62 


.53 


Self-Knouledge 


13 


29.6 


3.54 


.62 


.71 


.41 


Non-Xandoa Response^ 


6 


7.7 


.7i 


.56 


.37 


.45 


Poor lipression 


24 


1.5 


1.86 


.61 


.56 


.33 



*il«109 for Test*Xetest correlationi^. Test Xetcst Interval uas two weeks. 

^>281. Statistics reported for this scale ar« based on ssMple edited for overall Missing Data only. 
"Passingly score on Non-SandoM Response Scale < 6. ^ ^ 4 
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Table 8.3 

Fort Knox Field Test; ABLE Test-Ret est Results ^ 



Content Scales 

Emotional Stability 
Self- Esteem 
Cooperatlveness 
Conscientiousness 
Nondellnquency 
Traditional Values 
Work Orientation 
Internal Control 
Energy Level 
Dominance 
Physical Condition 



Mean 
Time 1 
(N - i iex 



64.9 
35.1 
54.1 
48.9 
55.4 
37.2 
61.2 
50.3 
57.1 
35.5 
31.1 



Mean 
Time 2 
fN "109) 



65.1 
34.8 
54.3 
48.3 
55.6 
37.9 
60.7 
50.2 
57.0 
34.9 
30.4 



Effect Sl2e <^ 



.02 
-.05 

.04 
-.10 

.02 

.15 
-.07 
-.01 
-.01 
-.09 
-.09 



Response Validity Scales 

Unlikely Virtues 
Self -Knowledge 
Non-Random Response" 
■ Poor Impression 



16.6 
29.6 
7.7 
1.5 



17.5 
29.0 
7.2 
1.2 



.27 
-.18 
-.65 
-.18 



^Test-Retest Interval was two weeks. 

^Based on sample edited for missing data only; Nj - 281 and - 121. 
^Effect Size - (Mean Time 1 - Mean Time 2)/SD Time 1 
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: Table 8.4 

Fort Knox Field Test; AVOICE Scale Score Characteristics 
(N « 270 except where otherwise noted) 



Median 

Nuiter of Test^Retttt' Itai*Scalt 





Itais 






Alpha 


r 


C- 


Narkuan 




15.8 


4.37 


.79 


.77 


.75 


Agriculturt 


5 


U.l 


3.99 


.68 


.69 


.70 


NathaMtict 


3 


15.1 


4.37 


.82 


.76 


.79 


Aesthetics 


5 


14.3 


4.17 


.77 


.72 


.74 


Leadership 




20.3 


4.70 


.81 


.56 


.74 


Electronic Co«Munication 


y 


21.1 


5.73 


.92 


.78 


.72 


AutoMted Data Processing 




23.4 


6.56 


.88 


.81 


.81 


Teachino/Counset ing 




22.8 


5.53 


.82 


.73 


.73 


Drafting 




21.5 


6.12 


.85 


.74 


.77 


Audiographict 


7 


23.8 


5.68 


.82 


.76 


.70 


Anaor/Cannon 




22.4 


6.57 


.85 


.74 


.69 


Vehicle/Equipnent Operator 


10 


28.1 


7.79 


.86 


.69 


.70 


Outdoors 




31.7 


6.41 


.79 


.69 


.66 


Infantry 


10 


29.1 


7.13 


.81 


.78 


.65 


$cience/Che«icat Operations 


11 


29.4 


8.93 


.89 


.79 


.71 


Supply Adainistratfon 


13 


35.0 


10.44 


.92 


.82 


.75 


Office Adainistration 


16 


45.2 


13.20 


.94 


.86 


.73 


Law Enforcoaent 


16 


48.1 


11.84 


.88 


.78 


.63 


Hechanics 


16 


50.0 


14.68 


.95 


.80 


.80 


Electronics 


20 


60.0 


17.48 


.96 


.74 


.77 


Heavy Constrtjctfon/Coidt>at 


25 


65.8 


17.90 


.94 


.76 


.70 


Medical Services 


24 


68.5 


18.79 


.95 


.84 


.69 


Food Service 


17 


48.2 


11.16 


.89 


.71 


.64 



•ii«ig for Test*Ketest correlations. 

ERIC 



due to time or due to a second administration. The change in mean scores 
is greater than for the content scales and the effect sizes are somewhat 
larger. Still, the changes are not large except for the Non-Random Re- 
sponse score. The change in this mean score indicates that more subjects 
responded less attentively the second time around, which is perhaps not 
surprising. We point out that the Non-Random Response Scale did "catch" 
this phenomenon, exactly as it was supposed to, and roughly four times as 
■any subjects "failed" this scale on the second administration as did on 
the first (2 percent vs. 9 percent, see Table 8.1). Overall we find these 
results reassuring with respect to the way the content and response validi- 
ty scales were designed to function. 

Table 8.4 shows that the AVOICE scales are also functioning well. 
Scale score statistics show adequate variance (SD ranges from 3.99, for a 
scale with a possible score range of 5-25, to 18.79, for a scale with a 
possible score range from 24-12). Alpha coefficients vary from .68 to .96 
with a median of .86, with the lower values occurring for the scales with 
fewer items, as would be expected. The median item- total scale score 
correlations are all very high (.60s to .70s), also indicating good inter- 
nal consistency. Finally, the test-retest coefficients are also accept- 
able to excellent in value (median value - .76, range from .56 to .86). 

The results shown in Tables 8.2, 8.3, and 8.4 and discussed above lead 
to the conclusion that the non-cognitive scales are very sound with regard 
to basic psychometric criteria of sufficient score variance and distribu- 
tion, internal consistency, and stability. 

Uniqueness Estimates for Non-Coanitive Measures 

Scales on both the ABLE and the AVOICE were examined for their poten- 
tial for providing incremental validity to the predictor battery. Unique- 
ness estimates were computed identically to those described for the cogni- 
tive measures in Chapter 4, by subtracting the squared multiple regression 
of a set of tests (e.g., the ASVAB) from the reliability estimate for the 
test of Interest (U^ - R^y^-R^). Uniqueness is, then, the amount of reli- 
able variance for a test not shared with the tests against which it has 
been regressed. 

...otJ*'*^®! ^'^ present the uniqueness estimates for the ABLE and 

AVOICE scales, respectively, when regressed against the ASVAB. The median 
U for the ABLE is .80, and ranges from .69 to .87, indicating that the 
ABLE overlaps very little with the ASVAB. The median estimate of unique- 

DfiLr ^^°J^^ '^^ "^^"9®^ ^"^o"" -59 to .95, indicating that the 
AVOICE also overlaps very little with the ASVAB. 

^'^ contains a summary of the --^rrelations between the ABLE and 
the AVOICE, and the other measures in the i>ilot Trial Battery. As can be 
seen here, the ABLE and AVOICE share very little variance with the cogni- 
tive and psychomotor tests in the Pilot Trial Battery. 

Factor Analysis of ABLE and AVQICF ?sral«> <; 

The ABLE content scales and the AVOICE scales were separately factor 
analyzed, and, in both cases, a two-factor solution appeared to best sum- 
marize the data. Table 8.8 shows the factor loading matrix for the ABLE 
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Table 8.5 

Mnlquengg? Estimates for 11 ABLE Scales In the Pilot Trial ^^tt'^^Y 
Against OtI.er ABLE Score?; and Against ASVAR 



Scale 


Nunbei? of 
Items 


Alpha 


Retest 

m=io9i 


ABLE, 
Adi 

fN=207i 


ASVAB, 
Adj r2 
fN-183) 


ASVAB 
U2 

Using Alpha 
fN»183> 


ASVAB 

U2 

Using T-R 
(N=183^ 


Emotional Steibility 


29 


.36 


.68 


.52 


.05 


.81 


.63 


Self-Esteen 


15 


.83 


.81 


.70 


.03 


.80 


.78 


Cooperativeness 


24 


.77 


.69 


.54 


.00 


.77 


.69 


Conscientiousness 


21 


.81 


.73 


.64 


.03 


.78 


.70 


Nondelinquency 


24 


.84 


.81 


.63 


.02 


.82 


.79 


Traditional Values 


16 


.70 


.74 


.50 


.01 


.69 


.73 


Work Orientation 


27 


.85 


.80 


.71 


.03 


.82 


.77 


Internal Control 


21 


.79 


.75 


.48 


.04 


.75 


.71 


Energy Level 


25 


.85 


.79 


.72 


.05 


.80 


.74 


Dominance 


16 


.86 


.83 


.50 


.00 


.86 


.83 


Physical Condition 


9 


.87 


.81 


.11 


.00 


.87 


.81 



09 

I 
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Table 8.6 

Uniqueness Estimates for 24 AVOICE Scales In the Pilot Trial Battery 
Against ASVAB 









Test- 




ASVAo 










Retest 


ASVAB^ 


U2 






Number of 


Alpha 


£ 


Adj g2 


Using Alpha 


Using T-R 


Scale 


Items 


(N=270). 


m»i27) 


fN=149) 


rN=149) 


rN=149» 


Marksman 


5 


.79 


.77 


.20 


.59 


.57 


Agriculture 


5 


.68 


.69 


.06 


. 62 


. 63 


Ma th em^ ti Ics 


5 


.82 


.76 


.02 


.80 


.74 


Aesthetics 


5 


.77 


.72 


.08 


.69 


.64 




6 


.81 


.56 


.00 


.81 


.56 


Electronic Coaununication 


7 


.92 


.78 


.01 


.91 


.77 


Automated Data Processing 


7 


.88 


.81 


. 00 


. 88 


.HI 


Te ach incr / Couns e 1 i ncf 


7 


.82 


.73 


.00 


.82 


.73 


Draft incr 


7 


.85 


.74 


.07 


.78 


.67 


Audio err aohics 


7 


.82 


.76 


.00 


.82 


.76 


Armor/Cannon 


8 


.83 


.74 


.11 


.72 


.63 


Vehicle/Equipment Operator 


10 


.86 


.69 


.14 


.72 


. 55 


Outdoors 


9 


.79 


.69 


.15 


.63 


.53 


Inf antrv 


10 


.81 


.78 


.12 


.68 


.65 


Science/Chemical OnAfatlonQ 


11 


.89 


.79 


.01 


.88 


.78 


Supply Administration 


13 


.92 


.82 


.00 


.92 


.82 


Office Administration 


16 ■ 


.94 


.86 


.03 


.91 


.83 


Law Enforcement 


16 


.88 


.70 


.02 


.86 


.76 


Mechanics 


16 


.95 


.80 


.32 


.63 


.48 


Electronics 


20 


.96 


.74 


.14 


.82 


.60 


Heavy Construction/Combat 


23 


.94 


.76 


.21 


.73 


. .55 


Medical Services 


24 


.95 


.84 


.00 


.95 


.84 


Food Service 


17 


.89 


.71 


.02 


.87 


.69 


Adventure 


14 


.96 


.86 


.26 


.70 


.60 
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Table 8.7 

Sunwarv of Overlap of Non-Cognitive Measures With Other 



Pilot Trial Battery Measures 



1. Between ABLE and PTB Cognitive Paper^and-Pencil Tests: 

• Only 19%, 29 of 150 correlations, are significant at p<.05. 

• The highest correlation is .23. 

2. Between ABLE and PTB Computer-Administered Measures: 

• Only 17%, 48 of 285 correlations, are significant at p<.05. 
o The highest correlation is .24. 

3. Between AVOICE and PTB Cognitive Paper-and-Pencil Tests: 

• Only 36%, 128 of 130 correlations, are significant at p<.05. 

• The highest correlation is .32. 

4. Between AVOICE and PTB Computer-Administered Measures: 

• Only 15%, 105 Of 684 corelations, are significant at p<.05. 

• The highest correlation is .30. 
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Table 8.8 

Fort Knox Field Test: A BLE Factor Analvsis a 
(N - 276) 



Personal Impact Deoendabilitv \& 


Sel f- Esteem 




.30 


.73 


Energy Level 


.73 


.46 


.74 


Dominance (Leadership) 


.72 


.13 


.54 


Emotional Stability 


.67 


.26 


.52 


Work Orientation 


.67 


.51 


.71 


Nondelinquency 


.20 


.81 


.70 


Traditional Values 


.19 


.73 


.57 


Conscientiousness 


.39 


.72 


.67 


Cooperatlveness 


.46 


.60 


.57 


Internal Control 


.44 


.50 


,44 
6.19 



Note: - communal Ity, the sum of squared factor loadings for a variable, 
^Principal factor analysis, varimax rotation. 
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content scales. Note first that the communal 1 ties for the scales are 
fairly high, indicating that the scales do share substantial common vari- 
ance. 



The first factor was labeled Personal Impact since the scales loading 
on the factor, in concert, suggest that persons scoring high on the factor 
would have high self-esteem, exhibit a high level of energy, could exert 
leadership, would appear emotionally stable, and would be work oriented. 
Note that two of the scales loading highest on this factor do have substan- 
tial loadings on the second factor- -Energy Level (.46) and Work Orientation 
(.51). Also, three of the scales loading highest on the second factor had 
substantial loadings here--Cooperativeness (.46), Internal Control (.44), 
and Conscientiousness (.39). 

The second factor was named Dependability. Scale loadings for this 
factor suggest that a high scorer on this factor would be a strong rule 
abider, a believer in traditional societal values, show conscientiousness, 
be cooperative, and believe that life's circumstances were largely under an 
individual's control. Again, keep in mind the scales that show high 
loadings on both factors (as noted in the above paragraph). 

This two-factor solution seems to us to make good intuitive sense for 
characterizing soldiers as well as possessing a fair amount of practical 
appeal. Being able to identify soldiers with high personal impact or 
leadership potential and a high degree of dependability would seem to be a 
potentially valuable contribution. 

The solution found in these field test data differs from the pilot 
test solution primarily in the number of factors that characterize the best 
solution. Two factors were viewed as best here, whereas a larger number of 
factors were viewed as best in those solutions (see Table 7.11). The most 
probable reason for this difference is the difference in the two samples. 
The field test results are based on a sample roughly two and one-half times 
as large and is probably a more representative sample in terms of diversity 
of MOS as well. Therefore, we think the field test data are "better" data 
to interpret. 

Table 8.9 shows the results for the factor analysis of the AVOICE. 
The scale communal ities for this AVOICE solution are a bit lower than those 
for the ABLE, but still do indicate a substantial amount of common variance 
for the set of scales. (Sixty-two percent of the total ABLE scale variance 
is in common compared to 54 percent for the AVOICE). 

The two factors found here were named Combat Support and Combat - 
Related. The former is defined largely by scales that have to do with jobs 
or services that support the actual combat specialties, while the latter is 
defined by scales that, for the most part, are much more related to spe- 
cialties that engage directly in combat. 

Also, as found with the ABLE, several scales show substantial loadings 
on both factors. Most of these occur for scales loading highest on the 
first factor, and include Science/Chemical Operations (.43 on second fac- 
tor). Electronic Communication (.36), Leadership (.35), and Drafting (.34). 
Only one scale loading highest on the second factor has a substantial 
loading on the first factor. Electronics (.45). 
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Table 8.9 

Fort Knox Field Test: A VQICE Factor Analv^l^: ^ 

{N - 270) n,tiiwf,.7 



Office Administration 

Supply Admlnlstratlcn 

Teach 1 ng/Couns el i ng 

Mathematics 

Medical Services 

Automated Data Processing 

Audlographlcs 

Electronic Communication 

Science/Chemical Operations 

Aesthetics 

Leadership 

Food Service 

Drafting 

Infantry 

Armor/Cannon 

Heavy Construction/Combat 

Outdoors 

Mechanics 

Marksman 

Vehicle/Equipment Operator 
Agriculture 
Law Enforcement 
Electronics 



I 


II 


Combat. Combat- 


SMgpprr Bellied^ 




.35 


-.13 




.78 


.11 




.76 


.11 




.74 


.09 




.73 


.18 




.71 


.10 




.64 


.17 




.64 


.36 




.61 


.43 




.61 


.04 




.58 


.35 




.54 


.19 




.54 


.34 


.10 


.85 




.13 


.84 




.17 


.84 




.02 


.74 




.17 


.74 




.05 


.73 




.17 


.73 




.18 


.64 




.27 


.61 




.45 


.57 





.73 
.62 
.59 
.55 
.57 
.51 
.44 
.54 
.'5 
.37 
.46 
.33 
.41 
.74 
.73 
.73 
.55 
.58 
.54 
.56 
.44 
.44 

12.49 



Note: - communal ity, the sum of squared factor loadings for a variable. 
^Principal factor analysis, varimax rotation. 

^Conventional, Social, Investigative,- Enterprising, Artistic constructs, 
^alistic construct. 
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The remarks made above about the comparison of ABLE factor analyses of 
the pilot and field test data apply equally here. Again, we think the 
field test data are probably the better set of results in terms of the 
representativeness of the samples. 

Finally, as with the ABLE, we think the two-factor AVOICE solution 
makes good intuitive sense and has practical appeal. It would seem to be 
helpful to be able to characterize applicants as having interests primarily 
in the combat MOS or in MOS supporting combat specialties, perhaps even at 
the point of recruitment as opposed to the selection or in-processing 
point. 
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FAKABILITY INVESTIGATIONS 



As discussed previously, In addition to the content scales, there were 
four response validity scales on the ABLE: Non-Random Response, Unlikely 
Virtues (Social Desirability), Poor Impression, and Self -Knowledge. An 
investigation was undertaken. Including an experiment, on intentional dis- 
tortion (faking) of responses. Data were gathered for this study from (1) 
soldiers instructed, at different times, to distort their responses and to 
be honest (experinental data gathered at Fort Bragg); (2) soldiers who were 
si«ply responding to the ABLE and AVOICE with no particular directions 
(data gathered at Fort Knox, in another type of "honest" condition); and 
(3) recently swom-in Array recruits at the Minneapolis Military Entrance 
Processing Station (HEPS). 

Purposes of the Faking Study 

The purposes of the faking study were to determine: 

t The extent to which soldiers can distort their responses to tem- 
perament and Interest inventories when instructed to do so. (Com- 
pare data from Fort Bragg faking conditions with Fort Bragg and 
Fort Knox honest conditions.) 

• The extent to which the ABLE response validity scales detect such 
intentional distortion. (Compare response validity scales in Fort 
Bragg honest and faking conditions.) 

t The extent to which ABLE validity scales can be used to correct or 
adjust scores for intentional distortion. 

• The extent to which distortion might be a problem in an applicant 
settings. (Compare MEPS data with Fort Bragg and Fort Knox data.) 

* J^L^V^.^^^^^^^^ ^" experimental group were 425 enlisted soldiers 
in the 82nd Airborne brigade at Fort Bragg in September 1984. Comparison 
samples were new recruits at a MEPS, in an approximation of an applicant 
setting. (N = 126) and Fort Knox soldiers described earlier (N = 276). 

Procedure and Design 



Four faking conditions were created: 

• Fake Good on the ABLE 
0 Fake Bad on the ABLE 

• Fake Combat on the AVOICE 

t Fake Noncombat on the AVOICE 
Two honest conditions wera created: 



t Honest on the ABLE 
• Honest on the AVOICE 
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The significant parts of the instructions for the six conditions were 
as follows: 

§ ABLE - Fake Good 

Imagine you are at the Military Entrance Processing Station 
(MEPS) and you want to join the Army. Describe yourself in a way 
that you think will ensure that ^he Army selects you. 

• ABLE - Fake Bad 

Imagine you are at the Military Entrance Processing Station (MEPS) 
and you do nai want to join the Army. Describe yourself in a way 
that you think will ensure that the Army does not select you. 

• ABLE - Honest 

You are to describe yourself as you really are. 

• AVOICE - Fake Combat 

Imagine you are at the Military Entrance Processing Station (MEPS). 
Please describe yourself in a way that you think will ensure that 
you are placed in an occupation in which you are likelv to be 
exposed to combat during a wartime situation. 

• AVOICE - Fake Noncombat 

Imagine you are at the Military Entrance Processing Station (MEPS). 
Please describe yourself in a way you think will ensure that you 
are placed in an occupation in which you are unl i kel v to be exposed 
to combat during a wartime situation. 

0 AVOICE - Honest 

You are to describe yourself as you really are. 

The design was repeated measures with faking and honest conditions 
counter-balanced. Thus, approximately half the experimental group, 124 
soldiers, completed the inventories honestly in the morning and faked in 
the afternoon^ while the other half (121) completed the inventories hon 
estly in the afternoon and faked in the morning. 

The experimental design and the numbers of soldiers from whom we 
gathered the intentional faking data appear in Table 8.10. In summary, a 
2x2x2 fixed-factor, completely crossed experimental design was used. 
The within-subjects factor, called "Fake," consisted of two levels (honest 
responses and faked responses). The first between- subjects factor, called 
"Set," consisted of the following two levels: Fake Good (for the ABLE)/ 
Want Combat (for the AVOICE) and Fake Bad (for the ABLE)/Do Not Want Combat 
(for the AVOICE). Order was manipulated in the second between-subjects 
factor such that the following two levels were produced: faked responses 
before honest responses, and honest responses before faked responses. 
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Table 8.10 

Faklno Experiment. ABLE and AVQICE: Fort Bragg 



AVOICE/ABLE COUNTS 

AM: Honest AVOICE 

Honest ABLE 
PM: Fake Combat AVOICE 

Fake Good ABLE 

Tuesday 

AM: Honest AVOICE 

Honest ABLE 
PM: Fake Noncombat AVOICE 
Fake Bad ABLE 

Wednesday 

• AM: Fake Combat AVOICE 
Fake Good ABLE 
PM: Honest AVOICE 
Honest ABLE 

Thursday 

AM: Fake Noncombat AVOICE 

N-61 

Fake Bad ABLE 

60 Complete Sets 

PM: Honest AVOICE 

N-60 

Honest ABLE 
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N-64 



62 Complete Sets 



N-62 



N-62 



62 Complete Sets 



N-62 



N-63 



61 Complete Sets 



N-61 



Faking Study Results - Temperament Invt»ntorv 



We performed a multivariate analysis of variance (MANOVA) on the 
experimental data from Fort Bragg. Table 8.11 shows the findings for the 
Interactions, the sources of variance nsost relevant to the question of 
whether soldiers can or cannot intentionally distort their responses. 

As can be seen, all the Fake x Set interactions are significant, 
indicating that soldiers can, when instructed to do so, distort their 
responses. 

Table 8.11 also shows that, for the Fake x Set x Order interaction 
effect, the overall test of significance is statistically significant for 
the response validity scales and marginally significant for the content 
scales. These results indicate that the order of experimental conditions 
in which the participant completed the ABLE affected the results. Table 
8.12 shows in greater detail the effects of intentional distortion; it 
shows the mean scores for the various experimental conditions for the 
content scales. This table and the remaining tables showing Fort Bragg 
ABLE results report the values for the soldier responses on the first 
administration of the particular condition. For example, the mean value of 
66.1 for Emotional Stability in the Honest First column of Table 8.12 was 
computed on 120 soldiers who completed the ABLE under the Honest condition 
MfiTfi they completed the ABLE under a Fake condition (either Good or Bad). 
Similarly, the mean value of 70.3 for Emotional Stability in the Fake Good 
First column of Table 8.12 was computed on 54 soldiers who completed the 
ABLE under the Fake Good condition before they completed the ABLE under the 
Honest condition. 

In general. Table 8.12 shows scores are higher on all the content 
scales when subjects are instructed to fake good (about .5 SD on average), 
and,. to a much greater extent, scores are lower on the content scales when 
subjects are instructed to fake bad (about 2 SDs on average). 

Another research question was the extent to which our response valid- 
ity scales detected intentional distortion. As can be seen in Table 8.13, 
the response validity scale Unlikely Virtues (Social Desirability) detects 
Faking Good on the ABLE; the response validity scales Non-Random Response, 
Poor Impression, and Sel f -Knowl edge detect Faking Bad. According to these 
data, the soldiers responded more randomly, created a poorer impression, 
and reported that they knew themselves less well when told to describe 
themselves in a way that would increase +he likelihood that they would not 
be accepted into the Anny. 

We also examined the extent to which we could use the response valid- 
ity scales Unlikely Virtues (Social Desirability) and Poor Impression to 
adjust ABLE content scale scores for Faking Good and Faking Bad. We re- 
gressed out Social Desirability from the content scales in the Fake Good 
condition and Poor Impression from the content scales in the Fake Bad 
condition. Table 8.14 shows the adjusted mean differences in content 
scales after regressing out Social Desirability and Poor Impression. Com- 
paring these differences, to the unadjusted differences shown in Table 8.12 
clearly shows that these response validity scales can be used to adjust 
content scales. However, two important unknowns remain: Do the adjustment 



8-20 




340 



Table 8.11 

Fakabilitv Study. MANOVA Results for ABLE Scales; Fort Braoa 



Type and N ame of Scale 

Response Validity Scales^ 

Overal 1 

Unlikely Virtues (Social 
Desirability) 

Self -Knowledge 

Non-Random Response 

Poor Impression 
Content Scales ^ 
Overall 

Emotional Stability 

Self-Esteem 

Cooperativeness 

Conscientiousness 

Nondelinquency 

Traditional Values 

Work Orientation 

Internal Control 

Energy Level 

Dominance (Leadership) 



Interactions 
Fake x Set Fake x Set x Order 



S 
S 
S 
S 

S 
S 
S 
S 
S 
S 
S 
S 
S 
S 
S 



s 

NS 
NS 
NS 

NS* 



Note: S ■ significant, p<.01. 

NS ■ nonsignificant, p>.01. 

* ■ marginally significant, .05<p>.01. 

^Sample size for Response Validity Scales is 219. 

''Sample size for Content Scales is 208. 
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Table 8.12 

Honesty and Faklno Effe cts, ABLE Content Seal ■r- Fort Bragg 





Honest First* 


Fake Good First? 


Fake Bad First* 


EstiMted 


EstiMted 


Scale 


M 


H 


a 


M 






M 


M 




Ev'fect Size 
Honest vs. Good 


Effect Size 
Honest vs. Bacf 


Enotional Stabilfty 


ISO 


66.1 


7.8 


54 


70.3 


10.2 


54 


50.1 


10.8 


-.49 


1.81 


Stlf-Ettrai 


115 


y .8 


4.7 


54 


38.2 


5.4 


54 


22.2 


5.8 


-.69 


2.48 


Cooptratfvcntss 


121 


53.2 


6.3 


54 


•55.5 


8.8 


54 


36.7 


10.4 


-.32 


2.12 


Cootcitntfousness 


116 


46.3 


5.8 


54 


49.6 


8.4 


54 


31.7 


8.7 


-.49 


2.13 


Nondet'nquency 


11* 


53.1 


6.2 


54 


54.8 


10.2 


54 


36.8 


9.6 


-.22 


2.19 


Traditioruil Values 


116 


36.7 


4.6 


54 


38.7 


6.5 


54 


23.6 


6.1 


-.38 


2.56 


worx Qrientation 


120 


59.3 


7.6 


54 


64.7 


10.3 


54 


40.8 


11.7 


-.63 


2.04 


Internal Control 


115 


49.5 


6.3 


54 


50.9 


8.2 


54 


35.6 


8.9 


-.20 


1.92 


Entrgy Level 


116 


57.5 


6.9 


54 


61.4 


9.1 


54 


37.9 


9.9 


-.51 


2.46 


OoMinance (Leadership) 


116 


35.6 


5.6 


54 


40.3 


!5.6 


54 


24.5 


6.6 


-.84 


1.87 


Physical Condition 


1t6 


33.0 


7.4 


54 


35.4 


7.7 


54 


18.3 


8.6 


-.32 


1.88 


















A ' i 







Mean scores sr. based on persons uho responded io this condition first. 



Table 8.13 



Honesty and Faking Effects, ABLE Response Validity Scales; port Bragg 





Honest First" 


Fake Ccod First* 


Fake Bad First* 


Effect Size 


Effect Size 


ABIE Retponst 
Validity seal* 


M 


H 




M 


H 


50 


M 






Honest vs. 
Fake Good 


Honest vs. 
Fake 8^ 


Unlikely Virtues 

(Social Desirability) 


109 


IS.ft 


3.1 


57 


20.1 


5.8 


56 


17.8 


A.8 




-.53 


SeU-riwuledge 


109 


29.6 


3.6 


57 


29.7 


4.1 


56 


21.8 


5.2 


-.03 


l"T8ri 


Non-Raodoai Response 


109 


7.6 


1.0 


57 


7.Z 


1.8 


56 


2.8 


2.2 


.45 




Poor Ivpression 


109 


1.5 


2.1 


57 


1.7 


2.2 


56 


U.6 


7.9 


-.09 





Values are based on the ssMple that conpleted the questionnaires under the condition of Interest first. 
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Table 8.14 



Effects of Regressing Out Two Response Valtdlty Scales (Social Desirability and 
Poor Impression) on Faking Condition. ABLE Content Scale Scores: Fort Bragg^ 



F»k» Good Fake 3«d 



Conttnt Seattt 


Adjusted Standardized 
Mean Difftrenca* 


Correlation with 
Social Oesirabflitv'* 


Adjusted Standardized 
Mean Difference* 


Correlation with 
Poor ImDression*^ 


EMtional Stabilfty 


•.U 


.14 


-.14 


-.41 


$tlf*EfltMI 


-.64 


.19 


.77 


-.40 


Cooptratfvtnttt 


.06 


.30 


.38 


•.47 


Cmcf tnt f ousnet fl 


-.17 


.31 


.31 


-.38 


Nondtlfnqutncy 


.13 


.31 


.63 


-.42 


Tradftfontl Valuts 


-.24 


.25 


1.00 


-.40 


Vork Orltntatlon 


-.33 


.30 


.32 


-.38 


Inttmal Control 


.03 


.15 


.22 


-.44 


Entrgy tvtt 
Doainanct {taadarship) 


-.12 


.24 


.45 


-.41 


-.63 


.25 


.32 


-.38 


Physical Condition 


-.07 


.20 


.35 


-.39 



^4-4 

* Standard asM differences are [Mean (Honest) sinus Mean <Fake)]/SO (Honest). 

^ Correlations an ftverage of correlations for first adiainistration under Honest and relevant Fake condition. 



formulas developed on these data cross validate and do they increase cri- 
terion-related validity? 

Overall, the ABLE data frm the Ff.*t Bragg faking study show that: 

1. Soldiers can distort their responses when Instructed to do so. 

2. The response validity scales detect Intentional faking; uiillkely 
Virtues (Social Desirability) detects Faking Good and Non- 
Random Response, Poor Impression, and Sel f -Knowl edge detect 
Faking Bad. 

3. An individual's Unlikely Virtues (Social Desirability) scale 
score can be used to adjust his or her content scale scores to 
reduce variance associated with faking good; an individual's Poor 
Impression scale score can be used to adjust his or her content 
scale scores to reduce variance associated with faking bad. 

Faking in An Add! leant Setting 

HEPS "Appli cant" Sample . Another of the purposes of the fakability 
study was to determine the extent to which intentional distortion actually 
is a problem in an applicant setting. To investigate this question, the 
ABLE and AVOICE were administered at the Minneapolis MEPS. However, the 
sample '? 126 recruits who completed the inventories were not true "appli- 
cants," in that they had just recently been sworn into the Army. 

MEPS Procedures. To approximate the applicant response set as closely 
as was possible with this sample, recruits were allowed to believe that 
their scores on these inventories might affect their Army careers. This 
was accomplished by deleting all references in the standard Privacy Act 
Statement (given to all subjects at the beginning of a testing session) to 
these data being collected for research purposes only, and not having any 
effect on the participant's career or status in the Army. Recruits were 
then asked to complete the ABLE and AVOICE, after which they were de- 
briefed. In tne debriefing each recruit was asked to read the debriefing 
form displayed as Figure 8.3, and the administrator orally summarized the 
information on this form and answered any questions the recruit might have. 

To examine the extent to which recruits actually believed their ABLE 
and AVOICE scores would have an effect on their Army career, each recruit 
filled out the single-item form shown in Figu/e 8.4 prior to debriefing. 
Of the 126 recruits in this sample, 57 responded "yes" to this question, 61 
said "no," and 8 wrote in that they didn't know. Thus, while the MEPS 
samplfi is not a true "applicant" sample, its make-up (recently sworn-in 
recruits, close to half of whom believe their ABLE and AVOICE scores will 
affect their Army career) is reasonably close. The response set for this 
sample is almost certainly more similar to that of the applicant population 
than is the Fort Knox sample. 

MEPS Resul ts Compared With Fort Knox and Fort Bragg Data . Table 8.15 
shows mean scores for MEPS recruits and the two "Hcnest" conditions of this 
study at Fort Bragg and Fort Knox. Even though the recruits are probably 
trying not to create a poor impression (MEPS Poor Impression mean is 1.05, 

8-25 



ERIC 



345 



Debriefing Font 

Description of B6v Results from This Test Session Will Be Used. 

•TTie ti^ti you havt juAt cjomplttzd <ue ^tlti in thz zxpvtJbntniat 
Ataqu. ThuA, xn^OAmtto*: you have pncvidzd today miJLL in no 
my in^^uence yowi coJizzJi Jin tht AAmy. In ^oct, no nUlUxuuj pvi- 
AonnzJi uiitl be abtz to took up youji acoku on ^e&e meeuu/te^. T/ie 
Xn^oAmcution you havt ptovidzd uiUZ be uAzd ioK AU^oAch paKpoAU 
only. 

li you havt any quz&tioM about tht tut6 ox thz tut ^U6Zon, 
ptzoAt a&k thz t2At adminUtnato/u 

Thank you vtnjy much {ok yovJi poAtiUpation. " 



Figure 8*3 Debriefing Form used in the faking study at the Military 
Entrance Processing Station (MEPS), 



Minneapolis 

HEPS 



Name: 

SS#: 

Do 70U think Tour answers to these questionnaires will have an effect on 
decisions that the Atmy makes regarding your future? 

Yes 

No 



Figure 8.4 Form filled OMt by MEPS recruits before debriefing. 
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Table 8.15 

Comparison of Results From Fort Bragg Honesty Fort Knox> and ME PS (Recruits) 
^BLE Scales 



ABIE Scale 

RetpoDM Validity Scalts 

Social DttlrablUty 

(Unllktly Virtues) 
Self-Knowledge 
Non*llando« tesponte 
Poor Imprcsalon 



Fort ilra90 
Hortst* 



(Recruits) 



Fort ICfiox 



Tota* Degrees of 



M 


BSSQ 


K 




K 






freedosi 


£ 


E 


116 


15.91 


121 


16.63 


276 


16.60 


3.21 


2«510 


2.15 


.12 


116 
116 
116 


29.54 
7.5« 
1.50 


121 
121 
121 


28.03 
7.79 
1.05 


276 
276 

276 


29.64 
7.75 
1.54 


3.63 
.64 
1.S4 


2,510 
2,510 
2,510 


9.10 
3.73 
3.15 


.00 
.02 
.04 



Content Scalt* 

6«tlon.l StW,Ufty 112 66.22 118 66.03 Zn 65.05 7-86 2.«99 

$«lf-E.te«. 112 34.77 118 34.04 272 35.12 5.00 2.499 

Cocp.r.tly««« 112 53.33 118 54.60 272 54.19 6.05 2.499 

Con.eltntlou««». 112 46.37 118 *6.49 272 48.97 5.86 2.499 

MondtlinqMncy 112 53.24 118 54.36 272 55.49 6.91 2.499 

Tr«Utlon.l Vl«. 112 36.67 118 36.97 272 37.28 4.50 2.499 

work Ori«it.tlon 112 59.71 118 58.37 272 61.40 7.73 2.499 

lntTn.1 control 112 49.48 118 51.90 272 50.37 6.13 2,499 

Entffytevel "2 57.56 118 56.67 272 57.19 6.95 2,499 

Oo.ln«x:. at«l«^lp) 112 35.54 118 32.84 272 35.41 6.05 2,499 

Phytlcl condition "2 32.96 118 28.27 272 31.08 7.49 2.499 



1.1S 
1.93 
1.34 

12.24 
4.48 
.77 
6.90 
4.75 
.48 
B.69 

12.10 



.31 
.15 
.26 
.00 
.01 
.46 
.00 
.01 
.62 
.00 
.GO 



• «... 



SeoTM »n bM«i on pwtm Oio ntpotOad to tht HooMt condition fJr»t. 
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which is lower than both the Fort Knox and Fort Bragg means, 1.54 and 

^ iIi.r"P®^?^^®.^^J» ^^^y ^° score significantly higher on the response 

validity scale Unlikely Virtues (Social DesirabilitJ). Indeed thii? S 

JjnJfnJ'^-^^? *° score high rathsr than low. They score highest on only two 
content scales and only one, Internal Control, is significant. 

In sum, intentional distortion may not be a significant problem in an 
??Sl^^.'lr£'^"^;. /^king or distortion would be in a d?an siJSa 
States) estimated in the present non-draft situation in the United 

Faking Study Results - Interests Tnv(= .nt^^r Y 

romhay%2inl^f'^ JJft^"*^'"^!^ 'u*^" 9''°"Ps. combat-rel ated and 

combat support, that emerged when we factor analyzed the AVOICE Fort Knox 
aata. we then performed a multivariate analysis of variance (MANOVA) on 
nnnfJSrirJh^ ^"'99- "^^^l" 8-16 and 8.17 show the 

nnpcJjSn nT J f °^ Variance most relevant to the 

question of whether soldiers can or cannot intentionally distort their 
resDonses. 



tn f?" ? f?^^' ?.°^ combat-related AVOICE scales are sensitive 

5 °^ 12 combat support scales are sensi- 
t ve to ntentional distortion. The interaction of Fake x Set x Order is 
riiSfr^^^" ^'^K"*u°'".'"*'*9inally significant, indicating that order of 
conditions in which the participants completed the AVOICE also affected the 



«I*f]"i®*^® ®*1^ show mean scores for the various conditions when 
condition was the first administration. When to d tS dis- 
IZL?^^\ responses so that they would not be likely to be placed in 
combat-related occupational specialties (MOS), that is, instructed to Fakp 

SS"5rSf24'?I?lL1.''"'?' '° '^^r^*" tSeir'^cores on'ajrsc' ef' Ico ef 
Jnn % 5 J?*^'^"^ l°«er in Fake Noncombat as compared to the 

honest cond tion. In the Fake Combat condition, soldiers in gSl in- 
po'^lcale'lcoJes'''-'''*'^ scale scores and decreased the^r'cSmbat llp- 

<r.^J^^ HJJL^i!*!!^!!^^ *° ^^^^^ ABLE response validity 

Inf^ML ^^^^i 5*^ demonstrated they could detect intentional distortion 
could be used to adjust AVOICE scale scores for faking combat and fakiSa 

Tnllf^'U. ^""^^^ ^-f" adjusted melS dif?e?Lces iS aSoIC? Slle 

scores after regressing out ABLE Social Desirability and Poor Imoression 
Comparing these differences to the unadjusted diffe?eS?es shSwn iJ ^ablei 
S;J!„;:^?-^^ ^''"^ adjustments have little effe^J. ^erhSr 

fhf f f/''^'""P°"" validity scales consisted of items from the ABLE and 
f^H^^^^"^^"'^^"'*^°"' ^^LE and AVOKE were diffe?Snt The ABLE 

faking instructions were Fake Good and Fake Bad, whereas the AVOICE fakina 
instrtictions were Fake Combat and Fake Noncombat. ^ 

cants'^woJl/''?n'^?ir; ^^.^ ^"vestigated of whether or not appli- 

TablLTi? ^^^^^ responses to the AVOICE. 

Hnnpc? rA„J^?! H scores for the HEPS recruits and the two 

Honest conditions. Fort Bragg and Fort Knox. There appears to be no parti- 
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Table 8.16 



FakablHtv Study. HANOVA Results for AVOICE Combat-Related Scales; Fort Bragg 
(N - 164) 



Type and Name of Scale 
Combat-Related Scales 
Overall 

Harksman 

Agriculture 

Armor/Cannon 

Vehicle/Equipment Operator 

Outdoors 

Infantry 

Law Enforcement 

Heavy Construction/Combat 

Mechanics 

Electronics 

Adventure 



Interactions 
Fake x Set Fake x Set x Order 



S 
S 

s 
s 
s 
s 
s 
s 
s 

NS 
NS 

S 



NS^ 



Note: S - Significant, p<.01. 

NS - Nonsignificant, p>.01. 

* - Marginally significant, .05<p>.01. 
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Table 8.17 

FakablHtv Study. KANOVA Results for AVOICE Combat Support Scales; Fort Sraaa 
(N ■ 201) 



Type and Name of Scale 
Combat Support Scales 



Interactions 
P^Kg X Set Fake x Set x Order 



Overal 1 


S 


S 


Mathematics 


S 


NS 


Aesthetics 


S 


S 


Leadership 


S 


S 


Electronic Communication 


S 


S 


Automated Data Processing 


S 


S 


Teaching/Counsel Ing 


NS 


NS 


Drafting 


NS 


NS 


Audlogrsphlcs 


NS 


NS 


Science/Chemical Operations 


S 


NS 


Supply Administration 


S 


NS 


Office Administration 


S 


NS 


Medical Services 


NS* 


NS 


Food Service 


S 


NS* 



Note: S - Significant, p<.01. 

NS - Nonsignificant, p>.01. 

* - Marginally significant, .05<i >.01. 
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Table 8.18 

Effects of Faking. AVOICE Combat Scales; Fort Bragg 



ERIC 



Xonttt 



Fakt CoabAt 



Fake 
Nonconbat' 



Effect Size 



AVOfCC Coeibet Scelet 


M 


BSIQ 




& 




SI 


H 






Honest vs. 

Cotrbat 


Honest vs. 
Honco«bat 


MarfcsMn 


122 


18. < 


4.5 


58 


20.2 


3.9 


60 


12.8 


5.9 


-.49 


1.06 


Agriculture 


12A 


15.0 


3.8 


59 


12.9 


3.6 


60 


15.1 


4.0 


.56 


-.03 


Araor/Cennon 


124 


24.2 


5.8 


59 


28.9 


7.6 


60 


15.1 


f6.3 


-.73 


1.53 


Vchicie/EqyipiMnt 


124 


28.7 


6.4 


59 


26.6 


7.9 


60 


23.5 


8.0 


.30 


.75 


Outdoors 


123 


36.0 


6.1 


59 


38.3 


6.0 


60 


25.7 


10.2 


-.38 


1.34 


Infantry 


1Z> 




A a 
0.0 




37.8 


8.2 


59 


20.5 


8.A 


-.59 


1.77 


Law EnforcsMnt 


124 


53.3 


10.8 


59 


54.5 


12.1 


60 


42.3 


12.5 


-.11 


.97 


Heavy Construction 


124 


70.5 


16.3 


59 


68.9 


15.0 


. 59 


58.7 


16.4 


.10 


.72 


Mechanics 


124 


50.7 


'i2.7 


59 


U.6 


15.2 


60 


47.3 


13.6 


.45 


.26 


Electronica 


124 


58.1 


18.3 


59 


50.3 


17.3 


60 


56.8 


18.0 


.43 


.07 


Adventure 


loa 


37.5 


4.3 


56 


38.1 


3.7 


54 


26.8 


6.6 


-.15 


2.06 



• ValuM ar. taMd on th. tmpi* that caM*t«d th« «|u«t<onn.frt untor the condftfon of fnttrwt first. 
^ Iffoct tizt • (HMfi Honttt mlrm mm CfMmt, or llonco*«t>/SO Total 
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Table 8.19 

Effects of Faking, AVQICE Combat Support Scales; Fort Brag g 



Fakt 

HooHt* f»k» CotUt* NoncoriMt* 



AVOICE 

Coitet tUDDort «eat#f 


1 






M 

B 






H 

a 




so 
« 


Montst vs. 

kSSSU 


if za 

Konaat vs. 
NoneoMbat 


NathMitfct 


120 


14.2 


4.7 


56 


11.8 


4.V 


59 


15.6 


5.0 


.31 


••29 




120 


14.6 


4.1 


57 


12.1 


4.6 


59 


17 1 






•.54 


Latctorshfp 


124 


22.3 


4.2 


59 


2:. 5 


4.1 


59 


17 3 


5.8 


10 

. IT 


1.05 


Electronic Comnunlcttion 


123 


21.1 


6.1 


59 


21.8 


7.0 


60 


14.2 


y .0 




1.16 


Automattd Oatt Proctttlng • 


122 


20.4 


6.7 


58 


15.5 


7.2 


59 




7.4 


• ft 


• .49 


Taachlng/CourwaUng 


124 


23.8 


5.7 


59 


20.7 


5.6 


60 


21.0 


y .0 




•49 


Draft fn9 


124 


22.3 


6.1 


59 


18.4 


6.2 


60 


21.5 


5.5 




mi 
• 14 


Audiograph fca 










10. f 


A. 9 

6.2 


60 


20.7 


5.6 


.83 


.50 


Scitnca/Chcaiical Operttlons 


123 


28.0 


8.4 


59 


28.0 


9.2 


60 


25.8 


9.6 


0 


.25 


Si|3p:y Adifniatration 


124 


30.5 


9.8 


59 


26.4 


9.6 


60 


35.3 


11.9 


.42 


•.46 


Offica Adainfatration 


123 


38.5 


13.3 


59 


31.2 


12.3 


59 


49.5 


17.2 


.56 


•.75 


Madical tarvfcat 


124 


67.8 


18.3 


59 


60.4 


17.5 


60 


61.0 


17.8 


.41 


.37 


^ood Sarvica 


122 


38.0 


10.2 


59 


31.0 


10.6 


59 


45.8 


16.3 


.68 


•.62 



; — 

Valuaa ar« baaad on tha aaaplt that coi^latad tha quaatlonnatra undar tha condition of intartat ffrat. 

^ effect ifzt • (Haan Honaat ainua Naan Faka Co«im:, or Faka Noncoa6at)/$0 Total 



Table 8.20 

Effects of Regressing Out Response Va lidity Scales (Unlikely Virtues and Poor 
Impression^ on Faking Condition. AVOI CE Combat Scales Scores; Fort Bragg 



Fakt C(»bat 



Fake NoncoMbat 



Adjusted Standardized Correlation with ^ 
flean DHference * Social DesirabUUv 



Adjusted Standardized Correlation with 
Mean Difference * poor linoresfion 



f!o«bat'«elaf ^ed AVOICE Scales 
Marksaan 
Agriculture 
Anaor/Cannon 

Vehicle/Equipi»nt Operator 

Outdoors 

Infantry 

Law Enforcaesnt 

Heavy Conetruction/Coaibet 

Mechanics 

ElMtronfcs 

Adventure (ABLE) 



• .71 
.48 

•1.35 

• .39 

• .33 
•1.08 

• .12 

• .06 
.32 

• .03 
.09 



.08 
'.15 

,19 
-.02 

.03 

.08 
-.03 

«32 
-.02 
•.02 



1.31 
• .14 

1.08 
.59 

1.82 

1.38 
.86 
<32 
.47 
.30 
•1.09 



.14 
.11 
-.15 
.01 
•.27 
-.18 
•.13 
•.15 
-.04 
•.08 
•.45 



(Continued) 



ERLC 



353 



Table 8. 20 (Continued) 



Effggg Of RMrgSSinq Out Response VaUdi tv Scales njnljkelv Virtues anri P^ r^ r 
Imprgsslon) on FaKlng Condition. AmicrcM^ ^^^^^ P nr 



Coipbtt Support AVOICE Scales 



Fakt Coabtt 



Fakt NoTKOibat 



Adjusted Standardized Correlation with 



* -'Jutted Standardized Correlation with 



ESSDMih£SDSSL^ ?9ctft Pc^frfbi^jtY^ Mean Difference ' EfififUSSmHSQ^ 



Mathenatica 


•23 


-.05 




• .34 


.22 


Aesthetics 


.51 


-.24 




• .17 


.13 


Leadership 


-.27 


-.05 




1.28 


-.16 


Electronic Coavunication 


-.55 


.15 




.78 


-.26 


Automted Data Processing 


.76 


-.06 




- .29 


.03 




.56 


-•13 




.76 


.03 


Oraftina 


.26 


-.05 




M 


.14 


Audiographics 


.69 


-.08 




.37 


.05 


Science/Che«ic8l Operations 


-.53 


.01 




.10 


.04 


Supply Adiini^cration 


.05 


-.05 




.04 


.21 


Office Adainistration 


.37 


-.10 




• .15 


.24 


Medical Services 


.00 


-.11 




.33 


.10 


Food Service 


.39 


-.13 


J.) 1 


-.75 


.26 



• standardized Man differences are Daean (Monest) • Kean (Fake))/SD (Honest). 
Correlations are average of correlations for first adainistration under honest and relevant fake condition. 



Table 8.21 

Comparison of Fort Bragg Honest. Fort Knox, and MEPS (Recruits^ AVOICE Combat- 
Related Scales '* 



Conbat^Retated AVOICC Scales 


Fort 8ra99^ 
(Honest) 


NEPS 
(Recruits) 
N Mean 


Fort 
N 


Knox 
IJcan 


Pooled 
SO 


Degrees of 
Freedon 


F 


E 


NarksMO 


122 


18.1 


121 


17.0 


256 


15.8 


4.4 


2«496 


12.0 


.00 


Agriculture 


124 


15.0 


124 


15.4 


267 


14.1 


3.7 


2«512 


4.5 


.01 


Anwr/Cannon 


124 


24.2 


125 


27.0 


268 


22.4 


6.2 


2«514 


22.8 


.00 


Vehicle/Equlpaent Operator 


124 


28.7 


125 


31.0 


268 


28.1 


7.2 


2«514 


7.3 


.00 


Outdoors 


123 


36.0 


125 


35.2 


268 


31.7 


6.1 


2,513 


26.7 


.00 


Infantry 


123 


33.5 


125 


33.2 


268 


29.1 


6.8 


2,513 


24.8 


.00 


Law Enforcenent 


124 


53.3 


124 


48.4 


265 


48.1 


11.3 


2,510 


10.1 


.00 


Heavy Const rue tlon/Conbat 


'124 


70.5 


124 


70.6 


269 


65.8 


16.8 


2,514 


5.3 


.01 


Mechanics 


124 


50.7 


125 


53.4 


269 


50.0 


14.1 


2,515 


2.54 


.08 


Electronics 


124 


58.1 


125 


59.5 


266 


60.0 


17.5 


2,512 


O.S 


.62 


Adventure (ABLE) 


108 


37.5 


101 


35.5 


211 


32.8 


5.2 


2,417 


31.5 


.00 



* HANOVA Significant tning Uilk't Liiixla (F > 6.3; df > 22,75A; p » .00) 
Fort BraM data are for honest first condition only. 



ERIC 



355 



Table 8.22 

CQIjlParlSPn 9f For t Pragg Honest. Fort K noc' and MEPS (Recruits) AVOICE Noncomhat^ 
Related Sc^igg^ 



Fort Br»OT^ M6PS 
(JJoncst) (RecruUt) Fort <n^x Pooled Degrees of 



3^6 



Norrconbat-related AVOICE Scales 


a 


asm 


M 


tiean 


s 


Bean 


SO 


Freedom 


£ 




NatheiMtict 


120 


14.2 


122 


13.7 


252 


15.1 


A.4 


2,491 


4.7 


*01 


Aesthetics 


120 


14*7 


121 


13.8 


261 


14.3 


4.2 


2,499 


4.2 


.02 


Leadership 


124 


22.3 


125 


19.7 


269 


20.3 


4.5 


2,515 


11.9 


.00 


Etectronic CoMrnicatlon 


123 


2M 


125 


21.7 


268 


21.1 


5.7 


2,513 


0.4 


.67 


AutoMttd Data Processino 


122. 


20.4 


\ 


19.0 


256 


23.3 


6.3 


2,496 


22.1 


.00 


Teach {n9/Corins«lin9 


124 


23.8 


125 


21.0 


268 


22.9 


8.7 


2,514 


8.7 


.00 


Drafting 


124 


22*3 




?fl 7 


CfV 


£1.7 


A 1 


C,3lO 


1.0 




Audfographlcs 


124 


23*5 


124 


22.1 


269 


23.8 


5.6 




4.2 


.02 


Sc{ence/Che«{cal Operations 


iZS 


25.0 


125 


26.9 


269 


29.3 


8.8 


2,514 


3.5 


«03 


Supply Adainistration 


124 


30.5 


125 


33.1 


268 


34.6 


9.8 


2,515 


8.9 


.00 


Office Ad;alni8tration 


123 


38.5 


125 


38.0 


267 


45.2 


12.7 


2,512 


19«4 


.00 


Medical Services 


124 


67,8 


125 


61.1 


267 


68.5 


18.8 


2,513 


6*9 


.00 


Food Service 


122 


38.0 


125 


42.4 


269 


42.2 


10*8 


2,513 


7.4 


«00 



• KAJWVA significant using Uilk*s La«bda (F ■ 6.1; df ■ 26,896; p ■ .00). 



T-rJ^r-'^ '^3 for honest first condition only. 



cu^'ar pattern to the mean score differences. The applicants score lowest, 
highest, and in the middle about an equal number of times. 

Overall, the AVOICE data from the faking study show that: 

1. Soldiers can distort their response:, when instructed to do so. 

2. The ABLE Social Desirability and Poor Impression scales are not 
as effective for adjusting AVOICE scale scores in the faking 
conditions of Combat/Noncombat as they are for adjusting ABLE 
content scale scores in the Faking Good/Faking Bad conditions. 

3. Faking or distortion may not be a significant problem in an 
applicant setting. 
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CONCLUDING COMMENTS 



The field tests of the non-cognitive measures indicate they are good 
measures of the intended constructs and that they are likely to contribute 
unique, reliable variance to the predictor domain. Score distributions and 
reliabilities show the measures to be sound psychometrically. The unique- 
ness analyses showed that the ABLE and AVOICE scales are measuring indi- 
vidual differences largely independent from these measured via the ASVAB or 
other parts of the Pilot Trial Battery. Factor analyses of ABLE and AVOICE 
scales showed a relatively simple u'.derlying structure that makes intuitive 
sense. Investigations of faking and fakability indicate scores can be in- 
tentionally distorted when persons are instructed to do so, biii distortion 
does not appear to occur in the present applicant setting, jnd the response 
validity scales on the ABLE can probably be used to correct for distortion 
when it does occur. However, more research is needed on the methods of 
applying such corrections and the effects of such corrections on the va- 
lidity of the non-cognitive scales for predicting job performance or other 
important criteria. 
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CHAPTER 9 



FORHUUTIOM OF THE TRIAL BAHERY 

Jody L. Toquain, Leaetta M. Hough, Bruce N. Barge, Rodney L. Rosse, 
Jam's S. Houston, and VyVy A. Corpe 



The way in which the Pilot Trial Battery was revised to produce the 
- Trial Battery is described in this chapter. The previous chapters have 
presented and discussed the development, pilot tests, and field tests of 
the Pilot Trial Battery. They show, we think, that the Pilot Trial Battery 
measures, as a whole, are psychometrically sound, measure relatively unique 
constructs, and appear to hold considerable promise as predictors of var- 
ious important criteria of job performance for Army soldiers. The nature 
of the revisions described here focused on satisfying the pragmatic cri- 
teria of limited testing time available for future Project A research, as 
well as improving the measures in the Pilot Trial Battery. 
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REVISIONS TO THE PILOT TRIAL BAHERY 



The full Pilot Trial Battery, as administered at the field tests, 
rsquirsd approximately 6.5 hours of actual administration time. However^ 
the Trial Battery developed from the Pilot Trial Battery (see Figure 1.2) 
had to be administered in about 4 hours during the next pha'^e of the 
project (Concurrent Validation). Therefore, not only did the measures in 
the Pilot Trial Battery need revision on the basis of field test experi- 
ence, but the total length of the battery had to be reduced by 33 percent. 

He devised three general principles, which we called a strateg y to be 
used as a guide In making the revision and reduction decisions. These 
principles were consonant with the theoretical and practical orientation 
that had been used since the inception of the project, as described in 
Chapter 1. The principles were: 

• Maximize the heterogeneity of the battery by retaining measures of 
as many different constructs as possible. 

• Maximize the chances of Incremental validity and classification 
efficiency as much as possible. 

• Retain measures with adequate reliability. 

Five more concrete Implications or guidelines for adopting this stra- 
tegy were developed. These are shown In Figure 9.1. With these guidelines 
In mind, Task 2 staff prepared summaries and presentations of the Informa- 
tion described In Chapters 2 through 8. 

In March 1985, these presentations were made at an In Progress Review 
(IPR) meeting held to consider the field test data and other relevant 
information, and decide on the methods and nature of revising the Pilot 
Trial Battery. Generally speaking, the presentations were within the three 
domains—cognitive (paper and pencil), perceptual /psychomotor (computer- 
administered), and non-cognitive— that had been used throughout the 
research (point 1 in Figure 9.1). The psychometric characteristics of each 
measure within a domain were reported, followed by a presentation of the 
covariance (correlations and factor structure) of the measures within the 
domain, across the domains, and with the ASVAB (uniquenass analyses). 
Then, estimates of expected validities for training and job performance 
criteria (based on the expert judgments, literature review, and Preliminary 
Battery analyses) were presented. Finally, initial recommendations for 
reduction and revisions were made. 

Considerable discussion was generated by these presentations, but the 
IPR group reached a consensus on the reductions and revisions to be made to 
the Pilot Trial Battery. This set of recommendations was the presented to 
and discussed at the meeting of the full Scientific Advisory Group. A few 
changes were made at this meeting. 
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1. Retain Measures in All Three Predictor Areas: 

• Cognitive (Paper-and-Pencil) 

• Perceptual /Psychomotor (Computer-Administered) 

• Non-Cognitive (Paper-and-Pencil) 

2. Retain Measures That Add Unique Variance 

• Variance Not Accounted for by ASVAB 

• Variance Not Accounted for by Other Pilot Trial Battery 
Measures 

3. Retain Measures That Predict Training Success and/or for 
which Experts or Literature Review Suggests Validity for Job 
Performance, Especially for Important Criteria or Criteria 
Not Presently Predicted by ASVAB 

4. Retain Measures That Show Stability With Respect to: 

• Test-Retest 

• Practice 

• Faking/Fakability 

5. Within Measures, Retain Items That Measure the Dominant 
Construct and Maximize Content Coverage 



Figure 9.1 Guidelines for evaluating and retaining Pilot Trial Battery 
measures in order to produce the Trial Battery. 
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Tables 9.1, 9.2, and 9.3 suninarize the change recommendations that 
cane from these meetings. These recommendations were used to guide the 
development of the Trial Battery from the Pilot Trial Battery. In the 
following sections, we describe these changes and their rationales, olus 
any iniernai improvements made to each measure. 

Changes to Cognitive Paoer-and.Pencil Tests 

Analyses of pilot and field tests of the cognitive paper-and-pencil 
tests showed that the tests, as a group, measure various aspects of spatial 
ability. When factor-analyzed with ASVAB subtests and the computer- 
administered tests from the Pilot Trial Battery, they formed a single 
factor of their own (see Table 6.10). Factor analysis of the tests by 
themselves, however, tends to show four or five factors (see Table 3.13). 
These results are not surprising, but we point them out to illustrate the 
point that the identification of the number and type of constructs measured 
by a set of tests depends very much on the level of analysis a researcher 
chooses. For purposes defined here, that is, reducing the number of tests 
to carry forward from the Pilot Trial Battery to the Trial Battery, we 
focused on a more specific level (four- five factors), but kept in mind that 
all the tests measure an underlying, more global spatial ability. Changes 
to the cognitive tests for use in the Trial Battery are described in the 
context of the constructs the tests were designed to measure: Spatial 
Visualization-Rotation and Field Independence, Spatial Visualization- 
Scanning, Figural Reasoning/Induction, and Spatial Orientation. 

r. 1^^? '*!^°* '^^^^^ Battery, the Spatial Visualization-Rotation and 
Meld Independence construct was measured by three tests: Assembling 
Objects, Object Rotation, and Shapes. Although Shapes was originally 
designed to measure Field Independence, and pilot test results indicated it 
correlat«id .50 with a marker test of that ability, we considered thic test 
IL^S!?®? T** Rotation tests for purposes of reducing the size of 

the Pilot Trial Battery. This combination seemed justified because the 
three tests had a similar pattern of factor loadings (see Table 3. 13). 
The Shapes Test was dropped because the evidencs of validity for job per- 
formance for tests of this type was judged to be less impressive than for 
the other tvjo tests. The Object Rotation Test was not changed. Eight 
terns were dropped from the Assembling Objects Test by eliminating those 
items that were very difficult or very easy, or had low item-total correla- 
tions. The time limit for Assembling Objacts was not changed. The effect 
was to make Assembling Objects more a power test than it was prior to the 

B.i Visualization-Scanning construct was measured by two 

Pilot Trial Battery tests. Mazes and Path. The Path Test was dropped and 
the Mazes Test was retained with no changes. Mazes showed higher test- 
retest reli abilities than Path (.71 vs .64) and lov...- gain scores (.24 SO 
units for Mazes vs 62 SD units for Path), which was desirable. In addi- 
tion. Mazes was a shorter test than Path (5.5 minutes vs 8 minutes). 

The Figural Reasoning/Induction construct was measured by the Rea- 
soning 1 and Reasoning 2 tests. Reasoning 1 was evaluated as the better of 
the two tests because it had higher reliabilities for both internal consis- 
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Table 9.1 

Summary of Changes to Cognitive Paoer-and-Pencil Measures in the 
Pilot Trial Battery 



Test Name 



Changes 



Assembling Objects 

Object Rotation 

Shapes 

Mazes 

Path 

Reasoning 1 

Reasoning 2 
Orientation 1 
Orientation 2 

Orientation 3 



Decrease from 40 to 32 items, 
Retain as is with 90 items. 
Drop Test. 

Retain as is with 24 items. 
Drop Test. 

Retain as is with 30 items. 
New name REASONING TEST. 

Drop Test. 

Drop Test. 

Retain as is with 24 items. 
New name ORIENTATION TEST. 

Retain as is with 20 items 
New name MAP TEST 
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Table 9.2 

Summary of Chances to Computer-Administered Pilot Trial Battery Measures 



Test Name 



Changes 



COGNITIVE/PERCEPTUAL TESTS 
Demographics 

Simple Reaction Time 
Choice Reaction Time 
Perceptual Speed & Accuracy 

Target Identification 

Short-Term Memory 

Cannon Shoot 
Number Memory 



Eliminate race, age, and typing experience 
items. Retain SSN and video experience 
items. 

No changes. 

Increase number of items from 15 to 30. 

Reduce items from 48 to 3G. Eliminate word 
items. 

Reduce items from 48 to 36. Eliminate 
moving items. Allow stimuli to appear at 
more angles of rotation. 

Reduce items from 48 to 36. Establish a 
single item presentation and probe dalay 
period. 

Reduce items from 48 to 36. 

Reduce items from 27 to 18. Shorten item 
strings. Eliminate item part delay periods. 



PSYCHOMOTOR TESTS 
Target Tracking 1 

Target Tracking 2 

Target Shoot 



Reduce items from 27 to 18, 
difficulty. 

Reduce items from 27 to 18, 
difficulty. 



Ir:rease item 



Increase item 



Reduce items from 40 to 30 by eliminating 
the extremely easy and extremely diffi- 
cult items. 
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Table 9.3 

Suiwnarv of Changes to Pilot Trial Battery Versions of Assessment of 



Background and life Experiences (A BLE) and Armv Vocational Interest 
Career Exani nation (AVOICEF 



Inventory/Scale Name 



Changes 



ABLE Total 



Decrease from 270 to approximately 
199 items. 



AVOICE Total 



Decrease from 309 to approximately 
228 items. 



AVOICE Expressed Interest Scale 



Drop scale. 



AVOICE Single Item Holland Scales Drop scales. 



AVOICE Agriculture Scale 



Drop scale. 



Organizational 01 imate/Envi ronment 
Preference Scales 



Move to criterion measure booklet 
{delete from AVOICE booklet). 



In addition to the changes outlined in this table by inventory/scale, 
it was recommended that all ABLE item response options be standard- 
ized as three-option responses and all AVOICE item response options 
be standardized as five-option responses. 
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tency (alpha « ,83 vs .65 and separately timed, split-half coefficients 
• .78 vs .63) and test-retest (.64 vs .57), as well as a higher uniqueness 
estimate (.49 vs .37). Reasoning 1 was retained with no item or time limit 
changes and Reasoning 2 was dropped. Reasoning 1 was renamed Reasoning 
Test. 

Three tests measured the Spatial Orientation construct in the Pilot 
Trial Battery. Orientation 1 was dropped because it showed lower test- 
retest reliabilities (.67 vs .80 and .84) and higher gain scores (.63 SD 
units vs .11 and .08 SD units). In addition, we modified the instructions 
for Orientation 2 because field test experience had indicated that the PTB 
instructions were not as clear as they should be. Orientation 2 was re- 
named Orientation Test. Orientation 3 was retained with no changes and 
renamed Map Test. 

Changes to Perceptual /Psychomotor Cofnputer-Administered Tests 

Before describing the changes made to specific perceptual/psychomotor 
tests in the computer-administered battery, we describe several improve- 
ments to the computer battery as a whole. 

Modifications in Computer Administration Procedures . The general 
changes included the following: 

1. Virtually all test instructions were -modified, in these ways: 

• Host instructions were shortened considerably. 

• Names of buttons, slides, and switches on the response pedestals 
were written in capital letters whenever they appeared in the 
instructions (e.g., BLUE, VERTICAL. RIGHT) to attract subjects* 
attention faster and more effectively. 



• Test terms and jargon were standardized. For example, in the 
PTB test instructions, the response pedestal was at various 
times called the "testing panel," the "response panel," and the 
"response pedestal." In the Trial Battery instructions, this 
apparatus was always referred to as the "response pedestal." 

• Where possible, the following standard outline was used in pre- 
paring the instructions: 

Test name 

-- One-sentence description of the purpose of the test 

Step-by-step test instructions 
--One practice item 

Brief re-statement of test instructions 

Two or three additional practice Hems 
-- Instructions to cal » test administrator if there are questions 

about the test 

2. Whenever test items had a correct response, the subject was given 
feedback on the practice items to indicate whether he/sne had 
answered the item correctly. 
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3. Rest periods were eliminated from the battery. (Previously, there 
were rest periods between the first half and second half of the 
items within several of the tests.) This was feasible because most 
tests were shortened. 

4. The computer programs controlling test administration were merged 
into one super-program, eliminating the time required to load the 
programs between tests. 

5. The format and parameters used in the software containing test 
items were reworded, so that the software was more "self -documented.' 

6. The total time allowed for subjects to respond to a test item (or, 
in other words, response time limit) was set at 9 seconds for all 
reaction time tests (Simple and Choice Reaction Time, Short-Term 
Memory, Perceptual Speed and Accuracy, and Target Identification). 
In the PTB version the response time limit had varied from test to 
test, for no particular reason. The field test data showed that, 
on almost all tr-'-ls of all reaction time tests, subjects were able 
to respond within 9 seconds. Therefore, the 9-second time limit 
was adopted as a standard. 

7. Also, with regard to the reaction time tests, the software was 
changed S3 that the stimulus for an item disappeared when the 
subject lifted his/her hand from the home button (in order to make 
a response). Subjects are instructed not to lift their hands from 
the home buttons until they have determined tha correct response; 
in this manner, separate measures of decision and movement time can 
be obtained. However, more than a few of the field test subjects 
continued to study the item stimulus to determine the correct 
response after leaving the home buttons. By causing the item to 
disappear, we hoped to eliminate that problem. 

All of the changes to the overall computer-admi.nistered test battery 
described above, and the individual test changes described below, were 
subjected to a series of small sample tryouts (N < 6 in each tryout) at the 
Minneapolis Military Entrance Processing Station (MEPS). These tryouts were 
for the purpose of inspecting and evaluating the software changes (in- 
cluding test items), eliciting feedback about instruction changes and 
insuring that the time needed to take the computer-administered test bat- 
tery was within the time that would be available for the upcoming Concur- 
rent Validation phase of Project A, No data were analyzed as a result of 
r because the total N was too small (less than 40), but they 

fulfilled the purpose of insuring that all changes were made correctly and 
were achieving the end desired. 

Changes to Content of Tests Adminis t ered bv Comp ntpr We turn now to 
a description of the specific changes made to the individual computer- 
administered tests for use in the Trial Battery. 

In the demographic section of the computer battery, items asking about 
age, race, and typing e perience were deleted. Information on age and race 
is available from other sources. Typing experience i-> no longer relevant 
since subjects' responses are now obtained via the response pedestal in- 
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No changes were recommended for Simple Reaction Time . However, we re- 
randomized the order of the pretrial intervals (the interval between the 
time the subject depresses the home button keys and the appearance of the 
trial stimulus). This was done because the pretrial intervals (the order 
of these Intervals had been randomly determined) tended to increase over 
trials 7-14, then dropped precipitously for trial 15; as a result, mean 
response time for trial 15 was significantly higher than mean response 
times for the previous several trials. Re-randomization was therefore 
considered desirable, to remove this abnormality. 

The number of items in Choice Reaction Time was increased from 15 to 
30 in an attempt to increase the test-retest reliability for mean reaction 
time on this test. 

Twelve items were eliminated from Perceptual Soeed and Accuracy (re- 
duced from 48 to 36 items), primarily to save time. Internal consistency 
estimates were high for scores on this test (.83, .96, .88, and .74 for 
Percent Correct, Mean Reaction Time, Slope, and Intercept, respectively), 
so item reduction did not seem to be cause for concern in that regard. 
Test-retest reliabilities were lower than internal consistencies, but it 
was not clear that item reduction would affect this greatly. The 12 items 
eliminated were all the "word" items (see Chapter 5 for a description of 
the item types in this test) rather than any of the alpha, numeric, sym- 
bolic, or mixed items, because word items were not used to calculate two of 
the scores. Slope and Intercept. 

Several changes were made to tht Target Identification Test . First, 
one of the two item types--the "moving" items— was eliminated. Field test 
data showed that scores on the "moving" and stationary items corre- 
lated .78, and the moving items had lower test-retest reliabilities than 
stationary items (.54 vs .74) and also had lower uniqueness estimates (.44 
vs .56). Also, two item parameters were modified. All target objects 
were made the same size (50% of the size of the objects depicted as pos- 
sible answers) since field test analyses indicated size had had no appreci- 
able effect on reaction time. A third level of angular rotation was added 
so the target objects were rotated either 0°, 45°, or 75°. Theoretically, 
and as found in past research, reaction time is expected to increase with 
greater angular rotation. Two of the item parameters were not changed 
(position of correct response object and direction of target object). 
Finally, the number of items was reduced from 48 to 36 in order to save 
time. Internal consistency and test-retest estimates indicated that the 
level of risk attached to this reduction would be acceptable. (For Mean 
Reaction Time, the internal consistency estimate was .96 and the test- 
retest estimate was .67.) The reduction from 48 to 36 items was accomp- 
lished by retaining 12 of the 24 "moving" items (which were to be elimin- 
ated as an item type, see just above) as stationary items. That is, the 
items had the same parameters they possessed as "moving" items, but were 
presented as "stationary" items. The retained items were those that had 
the proper item parameters to allow a balanced number of items in each of 
the cells defined by crossing the item parameters. The test, as modified, 
had two items in each of 18 cells determined by crossing angular rotation 
(0°,45°, 75°), position of correct response object (left, center, or middle 
of screen), and direction of target object (left-facing or right-facingK 
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Oiie Item parameter, probe delay period, was eliminated from the Short- 
Term M '- imorv Test , while two others. Item type (symbolic vs. letter) and 
Item Ungth 11, ^, or 5 objects) were retained. Analyses ov field test 
data showed that probe delay period did not significantly affect Kaan 
Reaction Time scores. To save time, 12 Items were eliminated. (Ellmln- 
tt\n% the probe delay period did not result In any reduction In Ivems.) 
Two of the three most Important scores for chls test appeared to have high 
enough reliabilities to withstand such a reduction (internal consistency 
and test-retest estimates were .94 and .78, respectively for Mean Reaction 
Time, .52 and .47 for Slope, and .84 and .74 for Intercept). Items were 
eliminated by deleting those items that had the lowest item-total score 
correlation, within the limitation of maintaining balance in the distribu- 
tion of items across the cells defined by item parameters. 

Finally, the software controlling tr- administration of this test was 
rewritten in attempt to reduce the amount of missing data occurring on 
the testo Fr< i est data indicat3d that some subjects appc^rently did not 
complete*/ ur.^:irstand the instructions, and completed items Inappro- 
priately, causing missing data (specifically, they rt"eased the home but- 
tons after the item's stimulus set disappeared but before the probe ap- 
peared). The rewrittfcii software gave feedback to the subject if an item 
was Inappropriately completed. If a subject completed three items Inappro- 
pria a y* he/she was told (by a message on the screen) to call the test 
administrator for ^:7ther instruction; also, the test ..ould not continue 
until tha administrator made a se(;jence of button pushes (unknown to the 
subject) . 

The number of items on the Cannon Shoot Test was reduced from 48 to 
36, again to save time. Internal consistency and test-retest reliabilities 
for the Time Error Score were high enough (.88 and .66, respectively) to 
warrant such reduction without the expectation of a significant impact or 
reliability. 

Also, the items were modified to eliminate two problems observed 
during the field tests. First, on some items, the target was actually not 
on the screen as it began its movement toward the cannon's line of fire. 
Second, on some items, the subject had to fire at the target almost as soon 
as it appeared on f'^a screen In order to hit the target with the cannon 
shell. Such items provided subjects with litt.3 or no opportunity to 
detevmine the speed and direction of the target, and thus to use movement 
judgment, which was the construct we intended tc measure. Therefore, the 
test was modified so t^at all targets are visible on tha screen at the 
beginning o the trial and so that the subject is given at least a couple 
seconds to view the speed and direction of the target before the tar< it 
reaches the optimal fire point. 

Two modifications were made to Number Memory to reduce test adminis- 
tration tiwe. The item part delay period was made a constant (1 second) 
rather than treated as a parameter with two levels (0.5 and 2.5 seconds), 
and the itsm string length (number of parts in an item) was chai ged fro.ii 4, 
6, or 8 parts to 2, 3, or 4 parts. These chrnges drastic.xlly reduced the 
time required to complete the test. As a result, no reduction in the 
number of items, as had. been recominended (see Table 9.2) was necessary. 
The Trial Battery version of this test had 28 items, constructed so that 
there were 13 replications of the four ariv'hmetic operations (add, sub- 
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tract, Multiply, and divide). 

Identical kinds of changes were made to the Target Tracking 1 and 
Target Tracking 2 tests. Internal consistency and test-retest reliability 
estlMates were relatively high for these tests (internal consistency » .97 
for both, test-retest - .68 and .77, for Tests 1 and 2, respectively), so 
vie felt confident we coiild reduce the number of Items from 27 to 18 In 
order to save tine. 

The difficulty of the test Items was Increased by Increasing the speed 
of the crosshair and the target. This was done because field test data 
Indicated that the Kean Distance Score was positively skewed; thus, the 
Items appeared not to be differentiating very well among high ability 
subjects. By increasing the difficulty of the items, we hoped to create a 
more normal distribution of scores. Related to this, we used the ratio of 
target to crosshair speed as a test parameter, rather than target speed. 
It seemed to make sense that, given z particular crosshair speed, the ratio 
would be a better indicator of item difficulty than the actual target 
speed. 

Finally, we modified the software controlling test administration so 
that the crosshair could not travel off the screen. During the field test, 
if a subject moved his/her crosshair so that it traveled off the screen (a 
not Infrequent occurrence when the target was near the edge of the screen), 
he/she would lose sight of the crosshair. This caused problems for some 
subjects, who seemed not to know what to do when this happened. 

• Several changes were made to the Target Shoot Test. First, all test 
items were classified according to three parameters: crosshair speeds 
ratio of target to crosshair speed, and item complexity (i.e., number of 
tur;is/mean segment length). Then, items were revised in order to achieve a 
balanced number of items in each cell when the levels of these parameters 
were. crossed. This had the result of "un-confounding" these parameters so 
that analyses could be made tc see which parameters contributed to item 
cJifficulty. 

Second, extremely difficult items were eliminated and item presen- 
tation times (the time the target was visible on the screen) were Increased 
to a minimum of about 6 seconds (and a maximum of 10 seconds). This was 
done to eliminate a severe missing data problem for such items (as much as 
40%), discovered during field tests. Missing data occurred when subjects 
failed to "fire" at a target. Time-to-Fire and Distance From Target scores 
■ could not be computed in these cases. These "no-fires" wera found to occur 
vihere the target movec very rapidly or made many sudr<en changes in direc- 
tion and speed, or the item lasted only a few seconds. Thus, the elimina- 
tio of such items and Increase in item time were intended to obviate the 
missing data oroblem. To save testing time, the number of items was re- 
duced from 40 to ?0, primarily by eliminating the extremely easy items. 
(/Although test-retest reliabilities were only .48 and .58 for Mean Time-to- 
Fire and Mean Log Distance scores, respectively, we thought that solving 
the missing data problem would allow us to reduce the absolute number of 
items and still maintain this level of test-retest reliability.) 

Finally, we added a feedback message to this test that reminded the 
subjects to press the red button (or "fire") when they had the crosshairs 
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on the target. If the subject failed to do so on the first practice item. 
This was done because a small percentage of subjects in the field test did 
not read the instructions carefully and treated this as a backing test, 
i.e., they did not "fire" at the target until several items had been 
attempted. Usually the test administrator noticed this lapse by subjects, 
but placing this feedback message gave greater assurance that subjects 
would complete the test properly. 

Changes to Non-Cognitive Measures (ABLE and AVOICE) 

Table 9.4 presents a summary of the item-reduction changes that were 
made from the Pilot Trial Battery to the Trial Battery versions of th^ ABLE 
and AVOICE, as projected in Table 9.3. We needed to effect a 25 percent 
decrease in the total number of ABLE and AVOICE items. The goal in this 
revision was to decrease items on a scale-by-scale basis, while prei^erving 
the basic heterogeneity of each scale. The strategy adopted to accomplish 
this was as follows for each scale: 

1. Sort items into content categories, 

2. Rank order within category, based on item-scale correlations. 

3. Drop last item in each category until desired number of items for 
that scale had been deleted. 

Table 9.5 lists the ABLE scales and the number of itemr in each for 
the Pilot Trial Battery version and for the subsequent Trial Battery. 
Overall, the ABLE was decreased from 270 items to 209 items. In addition 
to deleting items, we standardized all response options on the ABLE by (1) 
changing the several four- and five-option responses to three-option re- 
sponses and (2) ordering the response options so that the "highest" or 
"most" option (e.g., "All of the time") appeared first, and the "lowest" or 
"least" option (e.g., "None of the time") appeared third. Also, one last 
check was made to see whether there were still any response options that 
had such low endorsement rates as to be useless. A few such items were 
found, and their response options slightly modified. 

AVOICE scale revisions are listed in Table 9.6. The total number of 
AVOICE items was decreased from 309 to 214. Thirty-eight of these 214 are 
items on the Work Environment Preference scales. It was decided to take 
this whole section out of the AVOICE booklet and include it in one of the 
criterion measure booklets, where a bit more administration time was avail- 
able. Thus, 176 items remained in the AVOICE booklet. 

As can be seen in Table 9.6, the decision was made to delete the 
Agriculture scale, the six single-item Holland scales, and the eight Ex- 
pressed Interest items. There were no particularly compelling technical or 
psychometric reason for eliminating these scales; again, it was primarily a 
pragmatic decision in order to reduce the time necessary to complete the 
inventory. Reductions made on the remaining AVOICE scales were accom- 
plished using the same strategy as that for the ABLE, decreasing 5jale 
length while preserving heterogeneity. The only items that had fewer than 
five response options were deleted in the above-described revisions, so the 
resultant Trial Battery AVOICE was made up entirely of five-option re- 
sponses, from "Like Very Much" to "Dislike Very Much." 
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Table 9.4 

Suiroarv of Item Reducti on Changes f o r ABLE and AVOICE 



No. of Items 
In PT6 



No. of Items 

Recommended No. of Items 

for Trial Battery^ in Trial Battery 



ABLE 



270 



199 



209 



AVOICE, excluding 



269 



188 



176 



Organizational 
CI i mate/Environment 
Scales 

AVOICE, Organizational 40 40 38 

CI imate/Environment 

Scales 



Total 579 427 423 



* Based on IPR and SAG meetings described earlier in this chapter and sum- 
marized in Table 9.3. 
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Table 9.5 

Number of Items In Pilot Trial Battery and Trial Battery Versions 
of ABLE Scales 

No. of 
Items No. of Items 

ABLE Scale in PTB i n Trial Battery 

Emotional Stability 29 18 

Self -Esteem 15 12 

Cooperativeness 24 18 

Conscientiousness 21 15 

Nondelinquency 24 20 

Traditional Values 16 11 

Work Orientation 27 19 

Internal Control 21 16 

Energy Level 25 21 

Dominance 16 12 

Physical Condition 9 6 

Adventure 8 8 

Unlikely Virtues 12 12 
(Social Desirability) 

Self -Knowledge 13 13 

Non-Random Responses 8 8 

Poor Impression 24 23 

ABLE Total* 270 209 



This figure is not the simple sum of the number of Itams In each scale, 
since some items (e.g., on the Poor Impression Scale) are scored on more 
than one scale. 
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Table 9.6 

Number of Items in Pilot Tr a1 Battery and Trial Battery Versions 
of AVOICE Scales 

No. of 



Items No. of Items 

AVOICE Scale in PTB in Trial Battery 

Marksman 5 5 

Agriculture 5 0 

Mathematics 5 5 

Aesthetics 5 5 

Leadership 6 6 

Electronic Communication 7 6 

Automated Data Processing 7 6 

Teacher/Counsel i ng 7 6 

Drafting 7 6 

Audiographics 7 5 



Armor/Cannon 8 7 

Vehicle/Equipment Operator 10 6 

Outdoors 9 9 

Infantry 10 9 

Science/Chemical Operations 11 7 

Supply Administration 13 7 

Office Administration 16 10 

Law Enforcement 16 9 

Mechanics 16 10 

Electronics 20 12 

Heavy Combat/Construction 23 13 

Medical Services 24 12 

Food Service 17 11 

Adventure 6 6 

Single-Item Holland Scales 6 0 



Expressed Interest 8 0 
Organizational Climate/ 

Environment Preferances 40 38 (moved to crite- 

rion booklet) 

AVOICE Total^ 309 2H 



This fio is not the simple sum of the number of items in each scale 
since sone Uerns {e.g., on the Adventure Scale) are scored on more than 
one scale. 
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DESCRIPTION OF THE TRIAL BATTERY AND SUMMARY COMMENTS 

In this chapter we have described the revisions made to the Pilot 
Trial Battery that produced the Trial Battery. In ssssncs, the Trial 
Battery is a shortened and improved version of the Pilot Trial Battery used 
in the field tests. The Trial Battery was designed to be administered in a 
period of 4 hours and will be used during the Concurrent Validation phase 
of Project A. 

Figure 9.2 shows a general description of the Trial Battery. These 
are the measures that were the product of the revisions just described. 
Appendix H contains copies of the Trial battery measures (Appandix H is in 
a separate limited-distribucion report, ARI Research Note 87-24, 
as noted on p. xiv). 

As already noted, the Trial Battery's intended use is as a predictor 
battery in the Concurrent Validation phase of Project A. Those data will 
allow the replication of analyses described here on a much larger sample 
(approximately 10,000). In addition, job performance criterion data will 
be collected which will allow an examination of the validity of Trial 
Battery measures for predicting soldiers' job performance. All of this 
information will be used to make revisions to the Trial Battery, thereby 
producing the Experimental Battery that will be used in a Longitudinal 
Validation effort in 1986 and later years. (See Figure 1.2 for a flow 
chart showing the relationships between the Pilot Trial Battery, Trial 
Battery, and Experimental Battery.) 

Whatever the outcome of those future efforts, we think the 
development, pilot testing, and field testing leading up to the Trial 
Battery has reached the intended objectives. As already noted in Chapter 1 
(see Jisk 2: Progress Summary) ^ the measures developed came from a 
careful, structured process that identified the "best bets" for improving 
the prediction of soldiers' job performance. The new measures were 
developed using an iterative process that resulted in steady improvements 
guided by data. Procedures for efficiently and effectively administering 
the measures were developed along with the measures themselves. Finally, 
careful scrutiny of the psychometric characteristics of the measures shows 
them to be satisfactory to excellent in that regard. 
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COGNITIVE PAPER-AND-PENCIL TESTS 



Name 



Number of Items 



Time Limit 



Reasoning Test 
Object Rotation Test 
Orientation Test 
Naze Test 
Nap Test 

Asseabling Objects Test 



30 
90 
24 
24 
20 
32 



12 minutes 
7.5 minutes 

10 minutes 
5.5 minutes 

12 minutes 

16 minutes 



PERCEPTUAL/PSYCHOMOTOR COMPUTER- ADMIN ISTFRED TESTS 

Name Number of Items 



Denographlcs 

Reaction Tine 1 

Reaction Tine 2 

Nenory Test 

Target Tracking lest 1 

Perceptual Speed and Accuracy Test 

Target Tracking Test 2 

Nunber Nemry Test 

Cannon Shoot Test 

Target Identification Test 

Target Shoot Test 



2 
15 
30 
36 
18 
36 
18 
28 
36 
36 
30 



Approximate Time 



minutes 
minutes 
minutes 
minutes 
8 minutes 

6 minutes 

7 minutes 
10 minutes 



4 

2 
3 
7 



7 
4 
5 



minutes 
minutes 
minutes 



N0N-C06NITIVE PAPER-AKD-PENCIL INVENTORIES 
Name 



Assessnent of Background and Life Experiences 
(ABLE) 

Ar«y Vocational Interest Career Examination 
(AVOICE) 



Number of Items 
209 
176 



Approximate Time 
35 minutes 
20 minutes 



Figure 9.2. Description of Trial Battery measures. 
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APPENDIX A 

Data Bases Searched 
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fSmxtn. (Comonly knovn as P»yc Abscraecs) ThU flit is produetd by 
th% AMrlean Psysholocleai Assoeladon and eovtrs the world's llteracure 
ia psychology and ralacad babavloral and social sdancts such as psychiatry, 
•ocloloey. anthropology . aducatlon. pharmacology, and linguistics. Tht 
foilovlng ganaral flalds art covtrtd: appUad psychology, cducacicruil 
psychology, tsparlatntal human and animal psychology, axptrlmsntal social 
psychology, gtniiral psychology. parsonaUty. physical and ptywholocical 
iisordars, physiological Inttrvtntlon, physiological pathology, prcfassional 
parsonnal and Issuts, psychomatrlcs. social proctssts and issues, treatment 
mod pravtneioa. 

e?wr. (Co^ nmeat Printing Office Hbnthly Catalog) This file is produced 
by Cha Supc ^tendent of Documents. United States Government Printing 
Office and Indexas the public documents generated by ^ht legislative 
branch. e:t^eutive branch, and aU agencies of the United States Federal 
Covamment. Soma pubUcatlons from the Judicial branch ere also included, 
lha subjects covered are agriculture, commerce, dafense. health and human 
•arvlces. education enargy. housing, interior, justice, labor, state, trans- 
portation, and treasury* 

ms. (Katlonal Technical Information Service) This file is produced 
lythe Katlonal Technics! Informaclon Sdtvice of the U-S* Department of 
Commarca* The data base consists of government*sponsored research. dcvel«- 
opmeat. and engineering reports as veil as other analyses prepared by 
Sovaramant agencies, tbalr contractors, or grantees; The following are 
representative of the subject areas: administration and management; aero- 
mantles and aerodynamics; agriculture and food; astronomy and astrophysics; 
•tmospharlc sciences; behavior and society; biomedical technology and 
anglnaering; building Industry technology; business and economics; chemistry; 
dvU anglnaering; eoamunieatlon; computers, control, and information theory; 
alec tro technology; energy; environmental pollution and control; health 
planning; Industrial and mechanical engineering; library and Information 
aelar as; materials sciences; mathematical adencas; madiclne and biology; 
■ilitary aalences; missile technology; natural reaourees and aarth acienc^s; 
mavlgatlon. guidance, and control; nuclear science and technology; ocean 
technology and engineering; photography and recording devices; physics; pro- 
pulsion and fuala; apace technology; transportation; urban and regional 
technology. 

BgC# (Educational Hesources Information Center) This data file la pro- 
duced by The Katlonal Institute of Education and cover a the following 
^subject areaa: adult, career, end vocational education; counseling and 
fersonnel services; early childhood education; educational management; 
kaadicapped and gifted children; higher education; information reaourees; 
Junior colleges; langu&ges anJ llnguiatica; reading and cotnunlcation 
ekllla; rural educatioa and small achoola; science, mathematics, and 
environmental education; social studiea/sodal science education; teacher 
education; teats, meaaurement. and evaluation; and urban education. 
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.iiSCt 4 SSCB . (Social SelsMreh). Thtst filts art produced by thr Xaseleuc* 
ittr Seicaflfie Xaforaat.lon (ISI) and cQaaeleuee aa laeemaclooal, sulci- 
disclpllaary indax to t'ht llceracurt of cha soelalt bahavioral* aod ralaced 
■eleaeas. Subjects iae3.udcd in the data base are aachropoloty, archaeolosy, 
^aa studi#«, buainass and flaaaeat eoBonmieaeioat coaauniey haaich, crimin- 
•logy and panology, deaotraphy* aeonoaies', adueaeion research, echaic t'oup 
studies, geosraphy. hissory. inforaacioa/library acieace. iacemacioaal re- 
-latloas. law. liaquiscics. aaaafeaeac. aarkeclag. philosophy, policies! 
. seiaace. psychology, psychiacry. aociology. seaciscies. aad urban plaaaiag 
sad davelepaeac. 

tSTE. (Snichsoaiaa Science lafonucioa Cxchaage) This file is produced by 
tha'Saithsoaiaa Scieaee laforaacioa Exchange and coataias abscraccz of 
research either ia progrbj^a or eoapleced la the past two years. The data 
bases aacompass all fields of basic aad applied research ia the physical, 
social, eagiaeering. aad liie scieaces iadudiag: agaiculcurrl sciences, 
behavioral acieacas. biological acieacea. eheolscry aad cheaieal engineering, 
csrth adeaces. electronics, engineering aaterials. aatheaatics. aedical 
sciences, physics, social sciences aad ecooooics. 

OTIC. (Defense Technical Infomatioa Ceater) This file is produced by the 
Oef^e Logistics Aguuy. It aakes available froa oae ceatrai repository 
the thousaads of research and deveiopaeat reports produced each year by 
U.S. ailitary orgaaizatlons sad thslr coatractors aad graatees. Defense 
facilities aad their coatractors are required to submit to DTIC copies of 
Vch report (up to and including SECRET) that foraally records scientific 
sad tcchaical results of Defense-sponsored research, developoent, tjst, 
aad evaluatioB. Although created originally to serve the aUitary. DTIC 
services habe been extended to eU federal goveraoent agencies and chair 
contractors, subcsncraecors. and granceer. 
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Article Code 



ARTICLE REVIEW FORm' Reviewer Initial 



s 

8 



□ Article □ Book/itonograph 'Q Text Jlanuel Q Technical Report Q Others 

□ Check here if not reviewed; explain vhy below: 



1 



Predictor R«v 
Foro Codes: 



C 

.J 
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□ Job Proficiency □ Training Perfonaancc [J Other 
Description: 



o 

B 

DC 
U 



Developoent: 



VI 

n: 

< 



OS 

u 

O 

9C 
O 



CO 



Reliability: 
Value{s) 



Type and >!ethod of Estioatiou: 



Descriptive Statistics (H, x, S.D.}' 



Q Job Proficiency Q Training Perfomanre 0^^*^ 
Description: 



o 



Pftvelopnent : 



Kaliability; 
Value(?) _ 



Type and Method of Estinattion; 



Descriptive Statistics (N, x, S.D.}: 
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S.D. 



(Explain, if scale: 



Rartge 
Range , 



Median 
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Race 



Sex: 



Age: X 

Educ: X 





V 


B 
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Hisp 
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M 
















F 
















Total 


1 


1 1 1 


1 


1 



s.o. 
s.n. 
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Median 
Median 



(Explain, if scale: 



Purpose: 



Description: 



Sex: 



Age: X 
Educ: X 



fi; 

Totalj^ 



Race 
Asian Hisp 



Aw Ind Other Total 

I 



I 



I I 



L 



J — I — I 



Range 
Range 



Median 
Median 



(Explain, *f scale: 
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□ 

Criterion related: concurrent 
n Criterion related: predictive 
n Content validity 
□ Factor Anilyticor Paychocetric 

Raanalyaia* review, or aunmry of data or oast studies 

O Other: . 

Details of Methodology: 
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Opinions about research design » etc. 
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Predictor Cbde Reviewer Initials 

PREDICTOR REVIEl? FORM 

Predictor Title: 

Construct (Taxon): ' 

Intended to aessure: 



Brief description of predictor: 



Description of items/tasks: 




< 



Tins Limit /Appro x» Tine 



Kunber of items/ trials:- 

Power/Speeded. 

Adninistration Procedures: 



Scoring Procedures: 

Publisher aue: Article Code: _A_j: 
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APPENDIX C 

Names and Definiticns of Predictor and Criterion Variables 
Used in Expert Judgment Task 
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List of 53 Predictor Variables Identified For 
Inclusion In the Expert Judgment Task 
PREDICTOR VARIABLES 



Construct Naae 
Verbal Comprehension 

Muaerlcal Computation 

Use of Formulations 
and Number Problems 

Word Problems 

Reading Comprehension 

Tvo**Dlmen8loQal Mental 
Rotation 

Three-Olmenslonal Mental 
Rotation 

Inductive Reasoning: 
Concept Formation 

Spatial Visualization 
Deductive Logic 

Field Dependence 

Perceptual Speed and 
Accuracy 



Definition 

Measures knowledge of the meaning of words and ^..^iv 
relationships to each other. 

Measures speed and accuracy in performing simple 
arithmetic operations, i.e., addition, subtraction, 
multiplication, and division. 

Measures the ability to correctly use algebraic 
formulae to solve number problems. 

Measures the ability to select and organize relevant in- 
formation to correctly solve mathematical word problems. 

Measures the ability to read and understand written 
material . 

Measures the ability to identify a two-dimensional 
figure when seen at different angular orientations 
within the picture plane. 

Measures the ability to identify a three-dimensional 
object, projected on a two-dimensional plane, when 
seen at different angular orientations either within 
the picture plane or about the axis in depth. 

Measures the ability to discover a rule or principle 
and apply it in solving a problem. 

Measures the ability to mentally manipulate the compo- 
nents of a two- or three-dimensional figure into other 
arrangements. 

Ability to use logic and judgment in drawing conclusions 
from available information. Given a test of facts and 
a set of conclusions, deductive logic refers to the 
ability to determine whether the conclusions flow 
logically from the facts* 

Ability to find a simple form when it is hidden in a 
complex pattern. Given a visual percept or. configur- 
ation, field dependence (or Independence, more accurately) 
refers to the ability to hold it in mind so as to dls- 
embed it from other well-defined perceptual material. 

Ability to perceive visual information quickly and ac- 
curately and to perform simple processing tasks with it 
(e.g., comparisons). This requires the ability to make 
rapid scanning movements without being distracted by ir- 
relevant visual stimuli, and also measures memory, work- 
ing speed, and sometimes eye-hand coordination. 
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Construct Name 



PREDICTOR VARIABLES 

Definition 



Mechanical Comprehension 



Rote Memory 

Place Memory (Visual 
Memory) 

Ideational Fluency 

Follow Directions 
Analogical Reasoning 

Figural Reasoning 



Spatial Scanning 



Omnibus Measures of 
Intelligence/Aptitude 

Word Fluency 

Verbal and Figural 
Closure 

Processing Efficiency 

Selective Attention 

Time-Sharing 

Multilimb Coordination 



Ability to learn, comprehend, and reason with mechani- 
cal terms. More specifically, this is the ability to 
perceive and understand the relationship of physical 
forces and mechanical elements in practical situations. 

Measures the ability to recall previously learned but 
unrelated item pairs. 

Ability to remember the configuration, location, and 
orientation of figural material. 

Ability to rapidly generate ideas about a given topic 
or exemplars of a class of objects. 

Measures ability to follow simple and complex directions. 

Measures the ability to identify the underlying prin- 
ciples governing relationships between pairs of objects. 

Measures ability to generate and apply hypotheses about 
principles governing the relationship among several 
figures. 

Measures the ability to visually survey a complex 
field to find a particular configuration representing 
a pathway through the field. 

Measures general mental ability or general attitude. 



Ability to rapidly think of words. 

Measures ability to identify objects or words given 
sketchy or partial information. 

Speed of reactions to simply stimuli. 

This is the ability to attend to a target stimulus 
when presented with two or more stimuli simultaneously - 

Time-sharing is the ability to perform two or more 
tasks simulteneously . 

Multilimb coordination is the ability to coordinate 
the simultaneous movement of two or more limbs. This 
ability is general to tasks requiring coordination of 
any two limbs (e.g., two hands, two feet, one foot 
and one hand). It is most common to tasks where the 
body is at rest (e.g., seated or standing) while two 
or more limbs are in motion. 
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Construct Name 



PREDICTOR VARIABLES 

Definition 



Control Precision 



Rate Control 



Manual Dexterity 



Finger Dexterity 

Track Tracing Test 
Wrist'-Finger Speed 



Control precision is the ability to make fine, highly 
controlled (but not over-controlled) muscular move- 
ments necessary to adjust or positicn a machine or 
equipment control mechanism. This ability is general 
to tasks requiring motor adjut;tments in response to a 
stimulus whose speed and/or direction of movement are 
perfectly predictable. This ability is critical in 
situations where the motor adjustments must be both 
rapid and precise. The ability extends to arm-hand 
movements as well as to leg movements. 

Rate control is the ability to make continuous antici- 
patory muscular movements necessary to adjust or posi- 
tion a machine or equipment control mechanism. This 
ability is general to tasks requiring motor adjust- 
ments or movements in response to a moving stimulus 
which is changing speed and/or direction in a random 
or unpredictable manner. The ability applies to com- 
pensatory tracking of the stimulus as well as follow- 
ing pursuit cf the stimulus. 

Manual dexterity is the ability to make skillful, co- 
ordinated movements of the hand or the arm and hand. 
This ability most typically applies to tasks involv- 
ing manipulation of moderately large objects (e.g., 
blocks, pencils, etc.) under speeded conditions. 

Finger dexterity is the ability to make skillful, co- 
ordinated, highly controlled movements of the /"* ^^^ers. 
This ability applies primarily to tasks involving 
manipulation of objects with the fingers. 

Designed to measure arm-hand steadiness. 

The ability to carry out very rapid, discrete move- 
ments of the fingers, hands, and wrists. This ability 
applies primarily to tasks in which the ? uracy of 
the movement is not a major concern. This ability is 
determined entirely by the speed with which the move- 
ment is carried out. 



Aiming 



Speed of Arm Movement 



The ability to make very precise, accurate hand move- 
ments under highly-speeded conditions. This ability 
is dependent upon very precise eye-hand coordination. 

This ability involve? the speed with which discrete 
arm movements can t'^ made. The ability deals with 
the speed with which the movement can be carried out 
after it has been initiated. 
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Construct Name 



PREDICTOR VARIABLES 

Definition 



Involvement in 
Athletics and 
Physical Conditioning 

Energy Level 



Cooperativeness 



Sociability 



Traditional Values 



Dominance 



Self-esteem 



Conscientiousness 



Locus of Control 



Frequency and degree of participation in sports, exer- 
cise, and physical activity. Individuals high on this 
dimension actively participate in individual and team 
sports and/or exercise vigorously several times per week* 

Characteristic amount of energy and enthusiasm. The 
person high in energy level is enthusiastic, active, 
vital, optimistic, cheerful, zesty, and has the energy 
to get things done* 

Characteristic degree of pleasantness versus unpleas- 
antness exhibited in interpersonal relations. The highly 
cooperative person is pleasant, tolerant, tactful, help- 
ful, not defensive, and generally easy to get along with. 
His/her participation in a group adds cohesivenees . 

Outgoingneos. The person high in sociability is talk- 
ative, relates easily to others. Is responsive and ex- 
pressive in social environments, readily becomes 
involved in group activities, and has many relationships* 

Personal views in areas such as authority, discipline, 
social changa, and religious commitment* The person 
^ith traditional values accepts authority and the value 
of discipline, is likely to be religious, values pro- 
priety, and is conventional, conservative, and resistant 
to social change* 

Tendency to seek and enjoy positions of leadership and 
influence over others* The highly dominant person is 
forceful and persuasive at those times when adopting 
such characteristics is appropriate. 

Degree of confidence in one's abilities. A person with 
high self-esteem feels largely successful in past under- 
takings and expects to succeed in future undertakings* 

Characteristic amount of behavioral self-control. The 
highly conscientious person is dependable, planful, well 
organized, and disciplined* This person prefers order 
and thinks before acting. 

Characteristic belief in the amount of control people 
have over rewards and punishments. The person with an 
internal locus of control expects that th'^re are conse- 
quences associated with behavior and that people control 
what happens to them by what they do. The person with 
an external xOCub of control believes that what happens 
to people is beyond their personal control. 
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Construct Name 



PREDICTOR VARIABLES 

Definition 



Essotional Stability 



Nondelinquency 



Work Orientation 



Realistic Interests 



Investigative Interests 



Enterprising Interests 



Artistic Interests 



Social Interests 



Conventional Interests 



Cnaracteristic degree of stability vs. reactivity of 
emotions. The emotionally stable person is generally 
calm, displays an even mood, and is not overly dis*- 
traught by stressful situations. He/she thin^» clearly 
and maintains composure and rationality in situations 
of actual or perceived stress. 

Amount of respect for laws and regulations as mani- 
fested in attitudes and behavior. The nondelinquent 
person is honest, trustworthy, wholesome, and law^ 
abiding. Such persons will have histories devoid of 
trouble with schools and legal agencies. 

Tendency to strive for competence in one's work. The 
work-oriented person works hard, sets high standards, 
tries to do a good job, endorses the work ethic, and 
concentrates on and persists in completion of the task 
at hand. 

Preference for concrete and tangible activities, 
characteristics, and tasks. Persons with realistic in- 
terests enjoy, and are skilled in, the manipulation of 
tools, machines, and animals, but find social and edu^^ 
cational activities and situations aversive. 

Preference for scholarly, intellectual, and scientific 
activities and tasks. Persons with investigative in- 
terests enjoy analytical, ambiguous, and independent 
tasks, but dislike leadership and persuasive activities. 

Preference fo persuasive, assertive, and leadership 
activities and tasks. Persons with enterprising in- 
terests may be characterized as ambitious, dominant, 
sociable, and self-confident. 

Preferences for unstructured, expressive, and ambig- 
uous activities and tasks. Persons with artistic in- 
terests* may be characterized as intuitive, impulsive, 
creative, and non-conforming. 

Preferences for social, helping, and teaching activities 
and tasks. Persons with social interests may be charac- 
terized as responsible, idealistic, and humanistic. 

Preferences for well-ordered, systematic, and practical 
activities and tasks* Persons with conventional inter- 
ests may be characterized as conforming, unimaginative, 
efficient, and calm. 
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CRITERION CONSTRUCTS 



1. Xaspact sechanleal syseea*— veasure» and/or use dlagnostdLe 

•riiipaent as well es visual, aural and tactile senses » in 
conjunction with technical informatlonp to compare the operating 
status of aechanical ecpiipment (e.g., engines, trariaissions, 
Bschineguns) and mechanical components (e.g., bearings in an 
electrical generator) to standards o£ operating efficiency, and 
to Identify malfunctions. 

Actions say include: analyze, read, operate 

2. Troubleshoot mechanical systems-**use test, measuring, and diagnostic 

equipment, in conjunction with technical ixif ormation, to 
determine the cause of malfunctions in mechanical equipment 
(e.g. , engines, transmissions, machineguns) and mechanical 
components (e«g., bearings in an electrical generator). 

Actions may includes analyze, read, calculate 

3. Repair mechanical systems-^^perf orm corrective actions on previously 

diagnosed malfunctions of mechanical equipment or mechanical 
components using appropriate. tools (e.g., wrenches, screwdrivers, 
gauges, hammers) in conjunction with technical information.' 

Actions may include: adjust, assemble/disassemble, install, fix, 
read, work metal 

4. Inspect fluid systems-«use test, measurixYg, and diagnostic equipment, 

as well as visual, aural and tactile senses, in conjunction with 
technical information, to determine the operating status of fluid 
systems (e.g., hydraulic, refrigeration, engine cooling, 
compressed air) in comparison to standards of operating 
efficiency, and to identify malfunctions. 

Actions may include: analyze, read, operate 

5. Troubleshdot fluid systems — use test, measuring and diagnostic 

equipment, iu conjunction with technical information, to 
determine the cause of malfunctions in fluid systems (e.g. 
hydraulic, refrigeration, engine cooling, compressed air). 

Actions may include: analyze, read, calculate 

6. Repair fluid systems— perform corrective actions on previously 

diagnosed malfunctions of fluid systems using appropriate tools 
(e.g., wrenches, pressure gauges, soldering equipment) in con- 
Junction with technical information. 

Actions may include: adjust, assemble/disassemble, install, fix, 
read 
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7« Inspect electrical systeas^use test» measuring » and diagnostic 

equipment » as well as visual » aural and tacwile senses » in 
conjunction with technical information^ to dereraine the 
operating status of electrical systems (e«g«» generators^ wiring 
harnesses, switches, relays, circuit breakers, motors, lights) in 
comparison to standards of operating efficiency and to identify 
malfunctions. 

Actions may include: Analyse, read, operate 

8« Troubleshoot electrical systeffls~*use test, measuring end diagnostic 
equipment, in conjunction with technical information, to 
determine the cause of ma] "^unctions in electrical systems (e«g«, 
generators, wiring harnesses, switches, relays, circuit breakers, 
motors, lights)* 

Actions may iQclude: analyze, read, calculate 

9« Repair electrical systeas~perf orm corrective actions on previously 
diagnosed malfunctions of electrical systems and electrical 
components using appropriate tools (e«g«, pliers, wire strippers, 
soldering irons) in conjunction with technical information. 

Actions may include: adjust, assemble/disassemble, install, fix, 
read 

10. Inspect electronic systeas~use test, measuring and diagnostic 

equipment, and to a limited extent, visual, aural, and tactile 
senses* in conjunction with technical information, Lc compare the 
operating status of electronic systems (e.g., communications' 
equipment, radar, missile and t^.tk ballistics c^ikcrols) to 
standards of operating efficiency and to identify malfunctions. 

Actions may Include: analyze, read, operate 

Troubleshoot electronic systems—use test, measuring, and diagnostic 
equipment, in conjunction with technical information, to 
determine the cause or location of malfunctions in electronics 
systems (e.g., communication equipment, radar, missile and tank 
ballistics controls). 

Actions may include: analyze, read, calculate 

12. Repair electronic systems— perform corrective actions on previously 
diagnosed malfunction of electronic systems and electronic 
components using appropriate tools Ce.g.» te*st sets, 
screwdrivers, plies, soldering guns) in conjunction with 
technical information. 

Actions may Include: adjust, assemble/disassemble, install, fix, 
read 
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13. Repair aetal—perfora corrective actions (e.g., bend, cut, drill, 

«awj weld 5 rivet ^ hansier, s^ia-t wider, paint) to refsbricate 
■etal structures. 

Actions aay Include: calculate, assemble/disassemble, fix, 
construct, read, work aetal 

14% Repair plastic and fiberglass structures~perform corrective actions 
(e.g., aeasure, cut, saw, drill, sand, fill, paint, glue) to 
refabricate plastic and fiberglass structures. 

Actions may include: calculate, assemble/disassemble, flx» 
construct, read 

15. Construct vooden buildings and other structures~perf orm carpentry 

activities (e.g., neasure, saw, nail, plane) to frame, sheath and 
roof buildings, or to erect trestles, bridges, piers, etc. 

Actions may include: calculate, assemble /disassemble. Install, 
construct, read 

16. Construct masonry buildings and structures~perf orm masonry activities 

(e.g., meaisure, lay brick, pour concrete) to construct walls, 
columns, field fortifications, etc. 

Actions may Include: construct, calculate, assemble/disassemble, 
read 

17. Prepare paract4Utes — inspect cargo and personnel parachutes, repair 

or replace faulty parachute components, and prepare (i,e., pack) 
parachute for future air drop. 

Actions may include: adjust, Assemble/disassemble, pack/unpack, 
fix, sew, read 

18. Prepare equipment and supplie;? for air drop—fabricate and assemble 

platforms, cushions, and rigging to parachute supplies, equipment 
and vehicles; load, position and secure supplies and equipment in 
aircraft. 

Actions may include: adjust, assemble/disassemble, pack/unpack 
construct^ transport * 

19. Install electronic components— place and interconnect electronic and 

communication components and equipment (e.g., radios, antennas 
telephones, teletypewriters, radar, power supplies) and check*- 
system for operation. 

Actions may include: adjust, assemble/disassemble, install, read 
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20* Operate electronic equipaent~8et end adjust the controls of 

electronic components to operate electronic systems (e«g«t radio » 
radar 9 computer hardware » missile ballistics controls)* 

Actions may Include: adjust , operate 

21* Send and receive radio messages~use standardized radio codes and 
procedures to transmit and receive information. 

Actions may Include: signal, communicate, read 

22. Operate keyboard device~type information using a typewriter, teletype 
or keypunch, or computer termixul* 

Actions may Include: process, operate 

23« Use maps in the f ield~read and interpret map symbols and identify 
geography features in order to locate geography features and 
field positions on the map, and to locate map features In the 
field. 

Actions may include: analyze, identify, read, calculate 

'24. Plan placement or use of tactical position and f eatures- ^jr.Lng maps 

and on^dite inspection, identify geographic positiottS or areas to 
be used for cover and concealment or to place fortifications, 
mines, detectors, chemicals, etc. 

Actions may includes analyze, calculate, read 

25* Place tac::ical equipment and materials In the f ield~without using 

heavy equipment (e.g., lifts, dozers), place mines, detectors, 
chemicals, camouflage or other tactical items into position on 
the battlefield. 

Actions may include: use weapons, maneuver, transport, install 

26* Detect and Identify targets — using primarily sight, with or without 
optical systems, locate potential targets, and identify type 
(e.g., tanks, troops, artillery) and threat (friend or foe); 
report information. 

Actions may include: communicate, analyze 

27. Prepare heavy weapons for tactical use-*--transport, position and 

assemble heavy tactical weapons such as missiles, field 
artillery, anti^-aircraf t systems. 

Actions may include: adjust, assemble/disassemble, install, 
pack/unpack 
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28* Load field artillery or tank guns — manipulate breech controls and 
handle amaunition (stow and load) to prepare guns for firing. 

Actions may include: use weapons, pack/unpack 

29. Fire heavy direct fire weapons (e.g., tank main guns, TOV missile, 

infantry fighting yehide cannon) ~using optical sighting 
systems, manipulate weapon system controls to aim, track and fire 
oa designated targets. 

Actions may include: ace weapons, operate, adjust 

30. Operate fire controls of indirect fire weapons (e.g., field 

artillery)~using map coordinates and ballistics infonsation 
determine elevation and azimuth needed for firing at designated 
targets; adjust weapon using fire controls. 

Actions may include: analyze, calculate, read, adjust 

31. Fire Individual weapons~aim, track and fire hand operated weapons 

such. as rifles, pistols, and machineguns at designated targets. 

Actions may include: use weapons 

32. Engage in bayonet and hand«*to«*hand combat— use offensive and defensive 

body maneuvers to subdue hostile Individuals. 

Actions may Include: maneuver, apprehend 

33. Operate wheeled vehicles — use various vehicle controls to drive 

vheeled vehicles from point to point, generally over paved and 
unpaved roads, observe traffic regulations; secure cargo. 

Actions may include: maneuver, transport, operate 

34. Operate track vehicles — use various vehicle controls to drive track 

vehicles (e.g., tanks, AFCs, scout vehicles, bulldozers); steer 
in response to terrain features. 

Actions may Include: maneuver, transport, operate 

35. Operate lifting, loading and grading equipment — operate heavy 

equipment (e.g., fork lifts, cranes, loader, back-hoes, graders) 
to load, unload, or move heavy equipment, supplies, construction 
materials (e.g., culvert pipes, building or bridge trusses)^ or 
terrain features (e.g., earth, rock, trees). 

Actions may include: construct, operate 



cii 400 



CRITERION CONSTRUCTS 



36* Operatft pcywer cxcxvatlng eifuipacnt— use pneunatic hanmcrfl and drills, 
paving breakars, grinders, and backfill tampers, in the 
£abri^£ion and aodification of concrete, stone and «arthen 
atructuraa* 

Actions say include: construct, operate 

37; Reproduce printed «aterials-««perate duplicating uchlnes and offset 
presses to reproduce printed saterials; collate and bind 
aaterials using various typea of bindery equipnent. 

Actions aay include; adjust, operate, photograph, calculate 

38. Kske aovies and videotapes—use action picture cameras or videotape 

aquipnent to record visual and a^iditory aspects of assigned 
subject setter to be used for intelligence analyses, training or 
documentation. 

Actions may include: adjust, photograph 

39. Drav mr.ps and overlays~use drafting, graphics, and related 

techniques to prepare and revise maps, with symbols and legends, 
from aerial photographs* 

Actions may include: analyze, process, draw 

40. Write and deliver presentations—prepare scripts for formal 

presentation including radio and television broadcast; make oral 
presentations • 

Actions may include: analyze, write 

41* Record and file informatlon~collect, transcribe, annotate, sort, 

index, file, and retrieve information (e.g., training rosters, 
personnel statistics, supply inventories) • 

Actions may include: process, dispose 

42* Receive, store and issue supplies, equipment and other materials^ 
inspect material and review paperwork upon receipt; sort 
transport , and store material; issue or ship material to* 
authorised personnel or units* 

Actions may include: analyze, calculate, process, send 
pack/unpack, transport 

43. Prepare technical forms and documents — follow standardized procedures 
to prepare or complete forms and documents (e,g,, personnel 
records and dispositions, efficiency reports, legal briefs)* 

Actions may Include: process, vrlte, analyze 
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44* Translate or dacode data-^usa standardized coding s/steas and decoding 
rules to convert coded InforiBation to some more usable form 
(e.g., luterpret ^adar information^ decode Morsa code, translate 
foreign langxiages). 

Actions may include: analyze 

45. Analyze Intelligence data--^etermine importance and reliability of 

Information; Integrate Information to provide identification » 
disposition and novsment of enemy forces and estimate enemy 
capabilities. 

Actions may includes communicate, analyze » read 

46. Prepare food — prepare food and beverages according to recipes and meal 

plans (measure, mlx» bake, etc.); inspect fresh food and staples 
for freshness; maintain sanitary work aru. 

Actions may include: cook» read, sanitize, dispose, calculate 

47. Receive clients, patients, guests^schedule, greet and give routine 

Information to persons seeking medical, d. ^tal, legal or 
counseling services. 

Actions may include: administer, communicate, process 

48. Inter /icw~verbally gather information from clients, patients, 

witnesses, prisoners, or other persons. 

Actions may include: communicate 

49. Provide medical ^nd dental treatment-**give medical attention to 

soldiers in the field, or medical or dental clinic, or to animals 
(e.g., CPR, splinting fractures, administering injections, 
dressing vouQds). 

Actions may include: treat, sanitize, photograph 

SO* Select, lay-out and clean medical or dental equipment and supplies^ 
prepare treatment areas for use by following prescribed 
procedures for laylng-out Instruments and equipment; clean 
equipment and area for subsequent use. 

Actions may include: sanitize, assemble/disassemble, 
pack/unpack, dispose 

*S1. Perform medical laboratory procedures — conduct various types of blood 
tests, urinalysis, cultures, etc. 

Actions may Include: sanitize,, analyze, calculate, adjust 
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CRITERION CONSTRUCTS 



52. Control individuals crowds— apprehend suspected crlisinals. capture 
^emy soldiers » guard prisoners » participate In riot control 
operations 9 etc* 

Act5.ons may Include: apprehend, comminlcatey administer 

52. Control air tra££lc-~coordinate departing, en route, arriving and 

holding aircraft by monitoring radar equipment and communicating 
with aircraft and other air traffic control facilities. 

Actions may Include: communicate , analyze, send, operate, signal 



Initial Training Parfonaance Variables 



l# Traxning protress/sueeass^tueecasfullT eosplecing folraal training 

coursa in norsal amount o£ cl&t vtrsua wishing out, being raas-* 
signed t baing ''sat back** or ^recycled.** 



2« Cfforc/nocivacion in craining~cha dagrea of effort, ttocivaclon, and 
iatarast that a soldiar puts into his/har training, as evidenced 
by such things as curiosity about course* content « not baing 
afraid to be **urong^ or to ask questions, taking notcs« being 
attentive in class, studying on own tiae, seeking out the in- 
structor to clarify course content. 



Perf oraance of theoretical^ or "classroom** parts* of training*^ 
learaint the theoretical part of a course; perforaing veil 
•on quisaes, tests, and exaninations given in a classrooa 
setting that tests the acquisition of concepts, principles, 
facts t or other inforxution, a.g.« learning the basic food 
groups, understanding the principles of internal coabustion, 
learning the noaenclature of a weapon. 



4» rerforaance of practical, "bands-on** part of training— eppiyix^g 
the theory or principles of a course to practical problems 
and situations^ either during siaulations, field exercises, 
or other ''hands-on*' parts of training, a«g«,' cooking a aeal, 
repairiag ^n engine, firing a weapon, etc. 
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Nine Behavioral Dimensions of 
Generalized Arm/ Effectiveness 



1* FoUovins restilacions~consiscencly complying vich Army rules end 
reirilaclons; conforaint eppropriaeely eo standard procedures; 
follovlnf the spirit as veil as the letter cf nllitary and 
dvlllaa lavs, regulations, vritten orders, etc* 

2. Comal tsent to Army norms~adjusting successfully to Army lift; dis^ 
playing appropriate military appearance and bearing; shoving 
pride la being a soldier* 

3* Cooperation vlth supervisors—responding villlngly to orders, sug« 
Sestiotts, and other guidance from MCOs and officers; deferring 
appropriately to superiors* expertise and Judgment and being 
supportive of superior offlcers/NCOs* 

4* Cooperation vlth other unit members—pitching in vhen necessary to 
help other unit members vlth their job and mission assignments 
or during training; encouraging and supporting other unit mtmbers* 
as appropriate; shoving concern for unit objectives over and 
above personal interests* 

5* Bard vork and perseverance— vorking hard on the job and during training; 
sustaining maximum effort over long periods of hard duty and on 
daily assignments; coping veil vlth hardship or otherwise unpleasant 
conditions to continue to vork tovard mission completion* 

6« Aettntiott to detail— carrying out assignments carefully and thoroughly; 
consistently completing job and duty assignments on time or ahead 
of schedule; being conscientious in maintaining own and unit's 
•quipmtnt, and taking care to ensure that ovn quarters are clean 
and neat* 

7* Initiative— viningly volunteering for assignments; performing extra 
necessary tasks vlchout explicit orders; antldpatinr problems 
and caking action to prevent them* 



8* Discipline— consistently concentrating on the job or duty assignment 
rather than being distracted by opportunities to socialise or 
othervlse stop vorklng; controlling ovn emotions and noc allovlnc 
them to Interfere vlth performance of duty; keepins under concrol 
alcohol and other drug Intake so that performance is not effected* 

9. Emergent leadership— displaying good judgment in making suggestions 
Co others in the unit regarding the job, duty assignments, etc.; 
appropriately taking charge vhen placed in a leadership position; 
vhere appropriate, persuading others in the unit to accept his/ 
her ideas, opinions, and directions. 
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Six General Army Effectlveojuis Variables 



10. Survive in the field**-reaec co direct or indirect fire; eor?^ :ruct 
individual fighting position; eauaouflage self and equipment; 
use challenge and password; protect againsc IIBC attack. 



11 • Haintain physical fitness~keep self at physical fitness level appro* 
prlate for state of battle readiness* 



12. Disciplinary problems—having a record of disciplinary problems as 
reflected by AWOLS* Article LSs* civil arrests * etc. 



13. Attrition— separating from the Amy for "negative'* •masons such as 
discipline or dmg*-relaced problems. 



14. teenlista.s'^t— nlgning on for a second tour of duty. 



15. Job satisfaction/morale— being satisfied with own MOS and Army life. 
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APPENDIX D 

Scale Names and Number of Items in Each Scale 
for the Preliminary Battery 



D-l 

407 



Scale Names and Number of Items 
in Each Scale for the 
Preliminary Battery 



Th« scale names, with the nunber of itons each included parenthetically, are 
as follows: 

Perceptual-cognitive: ETS Figure Classification (FC: 28 ite«s with 8 
responses. each); ETS Hap Planning (MP: HO); ETS Choosing a Path (CP: 32); 
ETS Following Directions (FD: 20); ETS Hidden Figures (HP: 32); EAS Space 
fisualization (SV: 50); EAS Nufflerical Reasoning (NR:20); Flanagan Assembly 
(FMA: 20), 

Vocational interests (VOICE): Office Administration (20); Heavy Con- 
struction (20); Electronics (20); Medical Service (20); Outdoo-s (15); 
Aesthetics (15); Mechanics (15); Food Services (15); Uw Enforcement (15); 
Agriculture (15); Mathematics (12); Audiographics (10); Teacher/Counseling 
(10); Marksman (7); Drafting (7); Craftman (Z); Automated Data Processing 
(7). 

Temperament (Personnel Opinion Inventory or POI): Conscientiousness (DPQ 
Unlikely Virtues/PRF Infrequency: 10); Leadership (DPQ Social Potency: 26); 
Stress (DPQ Stress Reaction: 26); Discipline (CPI Socialization: 30); Moti- 
vation (Rotter I/E Locus of Control: 29). 

Biographioal Questionnaire (BQ): Scales for Males. Warmth of Parental 
Relationship (11); Academic Achievaaent (25); Social Introversion (22); 
Athletic Interest (10); Intellectualism (18); Aggressive/Independence (10); 
Parental Control vs. Freedom (11); Social Desirability (10); Scientific 
Interest (12); Academic Attitude (8); Sibling Friction (5). 

Scales for Females. Warmth of Maternal Relationship (13); Social Lead- 
ership (22); Academic Achievement (13); Parental Control vs. Freedom (11); 
Cultural Literary Interests (5); Athletic Participation (9); Scientific 
Interest (13); Feelings of Social Inadequacy (3); Adjustment (5); Expression 
of Negative Emotion (4); Social Maturity (2); Popularity with Opposite Sex 
(4); Positive Academic Attitude (7); Warmth of Parental Relationship (5). 

Rational (Combined Sex) Scales: Leadership (12); Social Confidence 
(D; Sooial Activity (11); Self Control (5); Antecedents of Self Esteam 
(6); Parental Closeness (13); Sibling Harmony (5); Independence (8); Aca- 
demic confidence (5); Academic Achievoaent (6); Positive Academic Attitude 
(6); Effort (4); Scientific Interests (5); Reading/Intellectual Interests 
(6); Athletic Interests (2); Athletic/Sports Participation (6); Physical 
Condition (18); Vocational-Technical Activities (4). 
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APPENDIX E 

Computerized Measures Observed During Site Visits 
for ARI Project A, Spring 1983 
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COMPUTERIZED KEASURES OBSERVED 
DURING SITE VISITS FOR ARI PROJECT A 



LOCATION 



MACHINE 



PROGRAMMING 
LANGUAGE 




PERCEPTUAL 

Simple Reaction Time 
Choice Reaction Time (2-6) 
Posner Physical Identity 
Posner Name Identity 
Single Word Classification 

Comparison of Word Prs. 
Line Length Judgments 
Visual Search 
Rotated Figures 
Perceptual Speed 



DOT Estl4iatlon 
Mental Rotation 
Decision Making S^ 
Embedded Figures 
Card Rotation^ 



ed. (CRT) 



Hidden Patterns^ 
Vaze Training^ 
Perceptual Speed Test 



INFORMATION PROCESSING 

Sternberg Numbers 

Sternberg Words 

01d-*Nev Item Recognition 

Rardom Two Responses 

Nine Digit Short Term Memory 

Continuous Paired Assoc. 
Dual Task-Tapping & Visual 
Visual Memory (5x5) 
Time Sharing: Compensatory 
Tracking & Digit Cancellation 



/ 



ese measures administered under NAMRL contract at the Aviation Research 
Laboratory in Illinois. 
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COMFUTEBIZED KBASUISS OBSERVED 
DURING SITE VISITS FOR ASl PROJECT A 



LOCATION 



MACHINE 



WW/ 



PREDICTOR 



lOTOKMATION PKDCESSING (CONT.) 

Encoding Speed 
Imnediate /Delayed Memory 
Item Recognition 
Tiae Sharing: Compensatory 
Tracking & Arithmetic^ 

Selective Attention (DLT)^ 
Time Sharing: Stick & Rudder 
6 DLT 

Sternberg Memory Search Tasks 1-A 
Delayed Digit Recall*^ 
Time Sharing: Compensatory 
Tracking & CRT 



OOGSflTIVE 

Humetical Operations 
Sentence Verification 
Paired Assoc. Learning 
Moyer^Landauer Task 
Releaming of Paired Assoc. 

Three Term Comparisons 
Similarity Judgments 
Days of Week Addition 
Sifflon-Kotovsky Task 
Word-Monvord Comparison 

Collins & Quillian 
Adaptive Vocabulary 
Thurs toners ABC 
?dsk Taking 
. Word Knowledge 
i 

{ !1*1 Computer Panel Test 




/ 
/ 



PROGRAMMING 
LANGUAGE 
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^AMBL is in the process of adapting these to an Apple computer with joy 
•tick, foot pedals and a speech generation chip. 

^These measures administered under NAMRL contract at the Aviation Research 
Laboratory in Illinois. 
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COMPUTERIZED MEASURES OBSERVED 
DURING SITE VISITS TOR ARE FHOJECT A 



LOCATION 



MACHINE 



PROGRAMMINC 
-I.ANCUAGE 




NON-COGSITIVE 
Activities Interest Inventory 

PSYCHOHOTOR 

^o*Handed Coordination 
Complex Coord./Stick & Rudder^ 
Complex Coordination^ 
Tank Video Caae^ 
One-Dimensional Compensatory 
Tracking^ 

Critical Tracking 
TWo-Dimensional Compensatory 

Tracking 
Kinesthetic Memory 
Helicopter Simulator 
Tank Turret Simulator 

Perceptronics Simulator 
. Gunner Tracking Task (using the 
HiUey "Burs t-on-Tar get" 
Simulator) 
Target Acqtsisition Task (xising 
the Willey "Burst-on-Targe t" 
Simulator) * 



/ 
/ 
/ 



/ 
/ 



/ 



/ 



4 

AiHBL is currently adapting the Complex Coordination (using two hands) 
to the PDP 11. 

'nAMRL is currently adapting this to an Apple computer with joy stick and 
foot pedals* 

^Developed under contract with ARE; work being carried out at Pensacola* 

^These measures adminlsterec under NAMRL contract at the Aviation Research 
Laboratory in Illinois* 
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OQHPUTERIZED HBASUBES OBSEIH^ 
DOBING SITE VISITS TOR ABI PEOJECT A 



LOCATION 



HACHINE 



FEOGRAMMIN 
LANGUAGE 



PBEDICTOR 



PSTCHOICTOR (OONT«) 



8 



tin Coatrol Computer Task 
J^Uslng the Chrysler Corp. 
91rs Control Conbat Simlator) 

Bound Sensing Task^ (Using 
several different pieces of 
equlpaent including T-scope, 
3 projectors 9 Allen Device, 
etc.) 

CoaipuCerized Target Engagement 

(also using 35 sn film, slides, 
-.and video equipment) 
Psychonotor Tracking Task 




/ 



/ 



/ 



/ 



^These measures may be more appropriately categorized elsewhere e g 
Perceptual or Information Processing (Figure memory) for the Round Sensing 
Task, but have been placed here due to the type of equipment required- 
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